How to speed up your optimization using derivatives#

Many optimization algorithms use derivatives to find good search directions. If you use a derivative based optimizer but do not provide derivatives of your objective function, optimagic calculates a numerical derivative for you.

While this numerical derivative is usually precise enough to find good search directions it requires n + 1 evaluations of the objective function (where n is the number of free parameters). For large n this becomes very slow.

This how-to guide shows how you can speed up your optimization by parallelizing numerical derivatives or by providing closed form derivatives.

Parallel numerical derivatives#

If you have a computer with a few idle cores, the easiest way to speed up your optimization with a gradient based optimizer is to calculate numerical derivatives in parallel:

import numpy as np
import optimagic as om


def sphere(x):
    return x @ x


res = om.minimize(
    fun=sphere,
    params=np.arange(5),
    algorithm="scipy_lbfgsb",
    numdiff_options=om.NumdiffOptions(n_cores=6),
)
res.params.round(6)
array([ 0.,  0., -0., -0., -0.])

Of course, for this super fast objective function, parallelizing will not yield an actual speedup. But if your objective function takes 100 milliseconds or longer to evaluate, you can parallelize efficiently to up to n + 1 cores.

Custom derivatives#

If you don’t want to solve your speed problem by throwing more compute at it, you can provide a derivative to optimagic that is faster than doing n + 1 evaluations of fun. Here we show you how to hand-code it, but in practice you would usually use JAX or another autodiff framework to create the derivative.

def sphere_gradient(x):
    return 2 * x


res = om.minimize(
    fun=sphere,
    params=np.arange(5),
    algorithm="scipy_lbfgsb",
    jac=sphere_gradient,
)
res.params.round(6)
array([ 0., -0., -0., -0., -0.])

In this example, the evaluation of sphere_gradient is even faster than evaluating sphere.

In non-trivial functions, there are synergies between calculating the objective value and its derivative. Therefore, you can also provide a function that evaluates both at the same time.

def sphere_fun_and_gradient(x):
    return x @ x, 2 * x


res = om.minimize(
    fun=sphere,
    params=np.arange(5),
    algorithm="scipy_lbfgsb",
    fun_and_jac=sphere_fun_and_gradient,
)

fun_and_jac can be provided in addition to or instead of jac. Providing them together gives optimagic more opportunities to save time by evaluating just the function that is needed for a given optimizer.

Derivatives with flexible params#

Derivatives are compatible with any format of params. In general, the gradients have just the same structure as your params.

def dict_fun(x):
    return x["a"] ** 2 + x["b"] ** 4


def dict_gradient(x):
    return {"a": 2 * x["a"], "b": 4 * x["b"] ** 3}


res = om.minimize(
    fun=dict_fun,
    params={"a": 1, "b": 2},
    algorithm="scipy_lbfgsb",
    jac=dict_gradient,
)
res.params
{'a': 2.2463462376344547e-07, 'b': 0.005724250649959959}

This is also the convention that JAX uses, so any derivative you get via JAX will be compatible with optimagic.

Derivatives for least-squares functions#

When minimizing least-squares functions, you don’t need the gradient of the objective value but the jacobian of the least-squares residuals. Moreover, this jacobian function needs to be decorated with the mark.least_squares decorator.

@om.mark.least_squares
def ls_sphere(params):
    return params


@om.mark.least_squares
def ls_sphere_jac(params):
    return np.eye(len(params))


res = om.minimize(
    fun=ls_sphere,
    params=np.arange(3),
    algorithm="scipy_ls_lm",
    jac=ls_sphere_jac,
)
res.params.round(5)
array([0., 0., 0.])

The fun_and_jac argument works just analogous to the scalar case.

Derivatives of least-squares functions again work with all valid formats of params. However, the structure of the jacobian can be a bit complicated. Again, JAX will do the right thing here, so we strongly suggest you calculate all your jacobians via JAX, especially if your params are not a flat numpy array.

Derivatives that work for scalar and least-squares optimizers#

If you want to seamlessly switch between scalar and least-squares optimizers, you can do so by providing even more versions of derivatives to minimize. You probably won’t ever need this, but here is how you would do it. To pretend that this can be useful, we compare a scalar and a least squares optimizer in a criterion_plot:

results = {}
for algorithm in ["scipy_lbfgsb", "scipy_ls_lm"]:
    results[algorithm] = om.minimize(
        fun=ls_sphere,
        params=np.arange(5),
        algorithm=algorithm,
        jac=[sphere_gradient, ls_sphere_jac],
    )

fig = om.criterion_plot(results)
fig.show(renderer="png")
../_images/f430f534ba065874caaf0f134dc5199d5f10fa714c38e513019dea1c01b1fd95.png

We see that both optimizers were super fast in solving this problem (mainly because the problem is so simple) and in this case the scalar optimizer was even faster. However, in non-trivial problems it almost always pays of to exploit the least-squares structure if you can.