Numerical differentiation#

In this tutorial, you will learn how to numerically differentiate functions with optimagic.

import numpy as np
import optimagic as om
import pandas as pd

Basic usage of first_derivative#

def fun(params):
    return params @ params
fd = om.first_derivative(
    func=fun,
    params=np.arange(5),
)

fd.derivative
array([0., 2., 4., 6., 8.])

Basic usage of second_derivative#

sd = om.second_derivative(
    func=fun,
    params=np.arange(5),
)

sd.derivative.round(3)
array([[2.001, 0.   , 0.   , 0.   , 0.   ],
       [0.   , 2.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 2.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 2.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 2.   ]])

You can parallelize#

fd = om.first_derivative(
    func=fun,
    params=np.arange(5),
    n_cores=4,
)

fd.derivative
array([0., 2., 4., 6., 8.])
sd = om.second_derivative(
    func=fun,
    params=np.arange(5),
    n_cores=4,
)

sd.derivative.round(3)
array([[2.001, 0.   , 0.   , 0.   , 0.   ],
       [0.   , 2.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 2.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 2.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 2.   ]])

params do not have to be vectors#

In optimagic, params can be arbitrary pytrees. Examples are (nested) dictionaries of numbers, arrays and pandas objects.

def dict_fun(params):
    return params["a"] ** 2 + params["b"] ** 2 + (params["c"] ** 2).sum()
fd = om.first_derivative(
    func=dict_fun,
    params={"a": 0, "b": 1, "c": pd.Series([2, 3, 4])},
)

fd.derivative
{'a': array(0.),
 'b': array(2.),
 'c': 0    4.0
 1    6.0
 2    8.0
 dtype: float64}

Description of the output#

Note. Understanding the output of the first and second derivative requires terminolgy of pytrees. Please refer to the JAX documentation of pytrees.

The output tree of first_derivative has the same structure as the params tree. Equivalent to the 1-d numpy array case, where the gradient is a vector of shape (len(params),). If, however, the params tree contains non-scalar entries like numpy.ndarray’s, pandas.Series’, or pandas.DataFrame’s, the output is not expanded but a block is created instead. In the above example, the entry params["c"] is a pandas.Series with 3 entries. Thus, the first derivative output contains the corresponding 3x1-block of the gradient at the position ["c"]:

fd.derivative["c"]
0    4.0
1    6.0
2    8.0
dtype: float64
sd = om.second_derivative(
    func=dict_fun,
    params={"a": 0, "b": 1, "c": pd.Series([2, 3, 4])},
)

sd.derivative
{'a': {'a': array(2.00072215),
  'b': array(0.),
  'c': 0    0.0
  1    0.0
  2    0.0
  dtype: float64},
 'b': {'a': array(0.),
  'b': array(1.9999955),
  'c': 0    0.0
  1    0.0
  2    0.0
  dtype: float64},
 'c': {'a': 0    0.0
  1    0.0
  2    0.0
  dtype: float64,
  'b': 0    0.0
  1    0.0
  2    0.0
  dtype: float64,
  'c':           0         1         2
  0  1.999995  0.000004 -0.000003
  1  0.000004  2.000001  0.000000
  2 -0.000003  0.000000  1.999997}}

Description of the output#

Note. Understanding the output of the first and second derivative requires terminolgy of pytrees. Please refer to the JAX documentation of pytrees.

The output of second_derivative when using a general pytrees looks more complex but is easy once we remember that the second derivative is equivalent to applying the first derivative twice.

The output tree is a product of the params tree with itself. This is equivalent to the 1-d numpy array case, where the hessian is a matrix of shape (len(params), len(params)). If, however, the params tree contains non-scalar entries like numpy.ndarray’s, pandas.Series’, or pandas.DataFrame’s, the output is not expanded but a block is created instead. In the above example, the entry params["c"] is a 3-dimensional pandas.Series. Thus, the second derivative output contains the corresponding 3x3-block of the hessian at the position ["c"]["c"]:

sd.derivative["c"]["c"].round(3)
0 1 2
0 2.0 0.0 -0.0
1 0.0 2.0 0.0
2 -0.0 0.0 2.0

There are many options#

You can choose which finite difference method to use, whether we should respect parameter bounds, or whether to evaluate the function in parallel. Let’s go through some basic examples.

You can choose the difference method#

Note. A mathematical explanation of the background of the difference methods can be found on the corresponding explanation page.

fd = om.first_derivative(
    func=fun,
    params=np.arange(5),
    method="backward",  # default: 'central'
)

fd.derivative
array([-0.        ,  2.        ,  4.        ,  6.        ,  7.99999994])
sd = om.second_derivative(
    func=fun,
    params=np.arange(5),
    method="forward",  # default: 'central_cross'
)

sd.derivative.round(3)
array([[2.006, 0.   , 0.   , 0.   , 0.   ],
       [0.   , 2.   , 0.   , 0.   , 0.   ],
       [0.   , 0.   , 2.   , 0.   , 0.   ],
       [0.   , 0.   , 0.   , 2.   , 0.   ],
       [0.   , 0.   , 0.   , 0.   , 2.   ]])

You can add bounds#

params = np.arange(5)

fd = om.first_derivative(
    func=fun,
    params=params,
    # forces first_derivative to use forward differences
    bounds=om.Bounds(lower=params, upper=params + 1),
)

fd.derivative
array([0.        , 2.        , 4.        , 6.        , 8.00000006])

Of course, bounds also work in second_derivative.