Numerical differentiation#
In this tutorial, you will learn how to numerically differentiate functions with optimagic.
import numpy as np
import pandas as pd
import optimagic as om
Basic usage of first_derivative
#
def fun(params):
return params @ params
fd = om.first_derivative(
func=fun,
params=np.arange(5),
)
fd.derivative
array([0., 2., 4., 6., 8.])
Basic usage of second_derivative
#
sd = om.second_derivative(
func=fun,
params=np.arange(5),
)
sd.derivative.round(3)
array([[2.001, 0. , 0. , 0. , 0. ],
[0. , 2. , 0. , 0. , 0. ],
[0. , 0. , 2. , 0. , 0. ],
[0. , 0. , 0. , 2. , 0. ],
[0. , 0. , 0. , 0. , 2. ]])
You can parallelize#
fd = om.first_derivative(
func=fun,
params=np.arange(5),
n_cores=4,
)
fd.derivative
array([0., 2., 4., 6., 8.])
sd = om.second_derivative(
func=fun,
params=np.arange(5),
n_cores=4,
)
sd.derivative.round(3)
array([[2.001, 0. , 0. , 0. , 0. ],
[0. , 2. , 0. , 0. , 0. ],
[0. , 0. , 2. , 0. , 0. ],
[0. , 0. , 0. , 2. , 0. ],
[0. , 0. , 0. , 0. , 2. ]])
params
do not have to be vectors#
In optimagic, params can be arbitrary pytrees. Examples are (nested) dictionaries of numbers, arrays and pandas objects.
def dict_fun(params):
return params["a"] ** 2 + params["b"] ** 2 + (params["c"] ** 2).sum()
fd = om.first_derivative(
func=dict_fun,
params={"a": 0, "b": 1, "c": pd.Series([2, 3, 4])},
)
fd.derivative
{'a': array(0.),
'b': array(2.),
'c': 0 4.0
1 6.0
2 8.0
dtype: float64}
Description of the output#
Note. Understanding the output of the first and second derivative requires terminolgy of pytrees. Please refer to the JAX documentation of pytrees.
The output tree of first_derivative
has the same structure as the params tree.
Equivalent to the 1-d numpy array case, where the gradient is a vector of shape
(len(params),)
. If, however, the params tree contains non-scalar entries like
numpy.ndarray
’s, pandas.Series
’, or pandas.DataFrame
’s, the output is not expanded
but a block is created instead. In the above example, the entry params["c"]
is a
pandas.Series
with 3 entries. Thus, the first derivative output contains the
corresponding 3x1-block of the gradient at the position ["c"]
:
fd.derivative["c"]
0 4.0
1 6.0
2 8.0
dtype: float64
sd = om.second_derivative(
func=dict_fun,
params={"a": 0, "b": 1, "c": pd.Series([2, 3, 4])},
)
sd.derivative
{'a': {'a': array(2.00072215),
'b': array(0.),
'c': 0 0.0
1 0.0
2 0.0
dtype: float64},
'b': {'a': array(0.),
'b': array(1.9999955),
'c': 0 0.0
1 0.0
2 0.0
dtype: float64},
'c': {'a': 0 0.0
1 0.0
2 0.0
dtype: float64,
'b': 0 0.0
1 0.0
2 0.0
dtype: float64,
'c': 0 1 2
0 1.999995 0.000004 -0.000003
1 0.000004 2.000001 0.000000
2 -0.000003 0.000000 1.999997}}
Description of the output#
Note. Understanding the output of the first and second derivative requires terminolgy of pytrees. Please refer to the JAX documentation of pytrees.
The output of second_derivative
when using a general pytrees looks more complex but
is easy once we remember that the second derivative is equivalent to applying the first
derivative twice.
The output tree is a product of the params tree with itself. This is equivalent to the
1-d numpy array case, where the hessian is a matrix of shape
(len(params), len(params))
. If, however, the params tree contains non-scalar entries
like numpy.ndarray
’s, pandas.Series
’, or pandas.DataFrame
’s, the output is not
expanded but a block is created instead. In the above example, the entry params["c"]
is a 3-dimensional pandas.Series
. Thus, the second derivative output contains the
corresponding 3x3-block of the hessian at the position ["c"]["c"]
:
sd.derivative["c"]["c"].round(3)
0 | 1 | 2 | |
---|---|---|---|
0 | 2.0 | 0.0 | -0.0 |
1 | 0.0 | 2.0 | 0.0 |
2 | -0.0 | 0.0 | 2.0 |
There are many options#
You can choose which finite difference method to use, whether we should respect parameter bounds, or whether to evaluate the function in parallel. Let’s go through some basic examples.
You can choose the difference method#
Note. A mathematical explanation of the background of the difference methods can be found on the corresponding explanation page.
fd = om.first_derivative(
func=fun,
params=np.arange(5),
method="backward", # default: 'central'
)
fd.derivative
array([-0. , 2. , 4. , 6. , 7.99999994])
sd = om.second_derivative(
func=fun,
params=np.arange(5),
method="forward", # default: 'central_cross'
)
sd.derivative.round(3)
array([[2.006, 0. , 0. , 0. , 0. ],
[0. , 2. , 0. , 0. , 0. ],
[0. , 0. , 2. , 0. , 0. ],
[0. , 0. , 0. , 2. , 0. ],
[0. , 0. , 0. , 0. , 2. ]])
You can add bounds#
params = np.arange(5)
fd = om.first_derivative(
func=fun,
params=params,
# forces first_derivative to use forward differences
bounds=om.Bounds(lower=params, upper=params + 1),
)
fd.derivative
array([0. , 2. , 4. , 6. , 8.00000006])
Of course, bounds also work in second_derivative
.