# How to calculate first derivatives#

In this guide, we show you how to compute first derivatives with estimagic - while introducing some core concepts.

```
import estimagic as em
import numpy as np
import pandas as pd
```

## Introduction#

As in the getting started section, let’s lookt at the sphere function $\(f(x) = x^\top x.\)$

```
def sphere_scalar(params):
return params**2
```

The derivative of \(f\) is given by \(f'(x) = 2 x\). With numerical derivatives, we have to specify the value of \(x\) at which we want to compute the derivative. Let’s first consider two **scalar** points \(x = 0\) and \(x=1\). We have \(f'(0) = 0\) and \(f'(1) = 2\).

To compute the derivative using estimagicm we simply pass the function `sphere`

and the `params`

to the function `first_derivative`

:

```
fd = em.first_derivative(func=sphere_scalar, params=0)
fd["derivative"]
```

```
array(0.)
```

```
fd = em.first_derivative(func=sphere_scalar, params=1)
fd["derivative"]
```

```
array(2.)
```

Notice that the output of `first_derivative`

is a dictionary containing the derivative under the key “derivative”. We discuss the ouput in more detail below.

## Gradient and Jacobian#

The scalar case from above extends directly to the multivariate case. Let’s consider two cases:

Gradient |
\(f_1: \mathbb{R}^N \to \mathbb{R}\) |

Jacobian |
\(f_2: \mathbb{R}^N \to \mathbb{R}^M\) |

The first derivative of \(f_1\) is usually referred to as the gradient, while the first derivative of \(f_2\) is usually called the Jacobian.

### Gradient#

Let’s again use the sphere function, but this time with a vector input. The gradient is a 1-dimensional vector of shape (N,).

```
def sphere(params):
return params @ params
```

```
fd = em.first_derivative(sphere, params=np.arange(4))
fd["derivative"]
```

```
array([0., 2., 4., 6.])
```

### Jacobian#

As an example, let’s now use the function $\(f(x) = (x^\top x) \begin{pmatrix}1\\2\\3 \end{pmatrix},\)\( with \)f: \mathbb{R}^N \to \mathbb{R}^3$. The Jacobian is a 2-dimensional object of shape (M, N), where M is the output dimension.

```
def sphere_multivariate(params):
return (params @ params) * np.arange(3)
```

```
fd = em.first_derivative(sphere_multivariate, params=np.arange(4))
fd["derivative"]
```

```
array([[ 0., 0., 0., 0.],
[ 0., 2., 4., 6.],
[ 0., 4., 8., 12.]])
```

## The output of `first_derivative`

#

As we have already seen in the introduction, the output of `first_derivative`

is a dictionary. This dictionary **always** contains an entry “derivative” which is the numerical derivative. Besides this entry, several additional entries may be found, conditional on the state of certain arguments.

`return_func_value`

If the argument `return_func_value`

is `True`

, the output dictionary will contain an additional entry under the key “func_value” denoting the function value evaluated at the params vector.

`return_info`

If the argument `return_info`

is `True`

, the output dictionary will contain one to two additional entries. In this case it will always contain the entry “func_evals”, which is a data frame containing all internally executed function evaluations. And if `n_steps`

is larger than 1, it will also contain “derivative_candidates”, which is a data frame containing derivative estimates used in the Richardson extrapolation.

For an explaination of the argument

`n_steps`

and the Richardson method, please see the API Reference and the Richardson Extrapolation explanation in the documentation.

The objects returned when `return_info`

is `True`

are rarely of any use directly and can be safely ignored. However, they are necessary data when using the plotting function `derivative_plot`

as explained below. For better understanding, we print each of these additional objects once:

```
fd = em.first_derivative(
sphere_scalar, params=0, n_steps=2, return_func_value=True, return_info=True
)
```

```
assert fd["func_value"] == sphere_scalar(0)
```

```
fd["func_evals"]
```

step | eval | ||||
---|---|---|---|---|---|

sign | step_number | dim_x | dim_f | ||

1 | 0 | 0 | 0 | 1.490116e-09 | 2.220446e-18 |

1 | 0 | 0 | 2.980232e-09 | 8.881784e-18 | |

-1 | 0 | 0 | 0 | 1.490116e-09 | 2.220446e-18 |

1 | 0 | 0 | 2.980232e-09 | 8.881784e-18 |

```
fd["derivative_candidates"]
```

der | err | ||||
---|---|---|---|---|---|

method | num_term | dim_x | dim_f | ||

forward | 1 | 0 | 0 | 4.470348e-09 | 8.467417e-08 |

backward | 1 | 0 | 0 | -4.470348e-09 | 8.467417e-08 |

central | 1 | 0 | 0 | 0.000000e+00 | 0.000000e+00 |

## The `params`

argument#

Above we used a `numpy.ndarray`

as the `params`

argument. In estimagic, params can be arbitrary pytrees. Examples are (nested) dictionaries of numbers, arrays, and pandas objects. Let’s look at a few cases.

### pandas#

```
params = pd.DataFrame(
[["time_pref", "delta", 0.9], ["time_pref", "beta", 0.6], ["price", "price", 2]],
columns=["category", "name", "value"],
).set_index(["category", "name"])
params
```

value | ||
---|---|---|

category | name | |

time_pref | delta | 0.9 |

beta | 0.6 | |

price | price | 2.0 |

```
def sphere_pandas(params):
return params["value"] @ params["value"]
```

```
fd = em.first_derivative(sphere_pandas, params)
fd["derivative"]
```

```
category name
time_pref delta 1.8
beta 1.2
price price 4.0
dtype: float64
```

### nested dicts#

```
params = {"a": 0, "b": 1, "c": pd.Series([2, 3, 4])}
params
```

```
{'a': 0,
'b': 1,
'c': 0 2
1 3
2 4
dtype: int64}
```

```
def dict_sphere(params):
return params["a"] ** 2 + params["b"] ** 2 + (params["c"] ** 2).sum()
```

```
fd = em.first_derivative(
func=dict_sphere,
params=params,
)
fd["derivative"]
```

```
{'a': array(0.),
'b': array(2.),
'c': 0 4.0
1 6.0
2 8.0
dtype: float64}
```

### Description of the output#

The output of `first_derivative`

when using a general pytree is straight-forward. Nevertheless, this explanation requires terminolgy of pytrees. Please refer to the JAX documentation of pytrees.

The output tree of `first_derivative`

has the same structure as the params tree. Equivalent to the numpy case, where the gradient is a vector of shape `(len(params),)`

. If, however, the params tree contains non-scalar entries, like `numpy.ndarray`

’s, `pandas.Series`

’, or `pandas.DataFrame`

’s, the output is not expanded but a block is created instead. In the above example, the entry `params["c"]`

is a `pandas.Series`

with 3 entries. Thus, the first derivative output contains the corresponding 3x1-block of the gradient at the position `["c"]`

:

## Multiprocessing#

For slow-to-evaluate functions, one may increase computation speed by running the function evaluations in parallel. This can be easily done by setting the `n_cores`

argument. For example, if we wish to evaluate the function on `2`

cores, we simply write

```
fd = em.first_derivative(sphere_scalar, params=0, n_cores=2)
```