# How to select a local optimizer#

This guide explains how to choose a local optimizer that works well for your problem. Depending on your strategy for global optimization it is also relevant for global optimization problems.

## Important facts#

There is no optimizer that works well for all problems

Making the right choice can lead to enormous speedups

Making the wrong choice can mean that you don’t solve your problem at all. Sometimes, optimizers fail silently!

## The three steps for selecting algorithms#

Algorithm selection is a mix of theory and experimentation. We recommend the following steps:

**Theory**: Based on the properties of your problem, start with 3 to 5 candidate algorithms. You may use the decision tree below.**Experiments**: Run the candidate algorithms for a small number of function evaluations and compare the results in a*criterion plot*. As a rule of thumb, use between`n_params`

and`10 * n_params`

evaluations.**Optimization**: Re-run the algorithm with the best results until convergence. Use the best parameter vector from the experiments as start parameters.

We will walk you through the steps in an example below. These steps work well for most problems but sometimes you need variations.

## A decision tree#

This is a practical guide for narrowing down the set of algorithms to experiment with:

graph LR classDef highlight fill:#FF4500; A["Do you have<br/>nonlinear<br/>constraints?"] -- yes --> B["differentiable?"] B["Is your objective function differentiable?"] -- yes --> C["ipopt<br/>nlopt_slsqp<br/>scipy_trust_constr"] B["differentiable?"] -- no --> D["scipy_cobyla<br/>nlopt_cobyla"] A["Do you have<br/>nonlinear constraints?"] -- no --> E["Can you exploit<br/>a least-squares<br/>structure?"] E["Can you exploit<br/>a least-squares<br/>structure?"] -- yes --> F["differentiable?"] E["Can you exploit<br/>a least-squares<br/>structure?"] -- no --> G["differentiable?"] F["differentiable?"] -- yes --> H["scipy_ls_lm<br/>scipy_ls_trf<br/>scipy_ls_dogleg"] F["differentiable?"] -- no --> I["nag_dflos<br/>pounders<br/>tao_pounders"] G["differentiable?"] -- yes --> J["scipy_lbfgsb<br/>nlopt_lbfgsb<br/>fides"] G["differentiable?"] -- no --> K["nlopt_bobyqa<br/>nlopt_neldermead<br/>neldermead_parallel"]

Going through the different questions will give you a list of candidate algorithms. All algorithms in that list are designed for the same problem class but use different approaches to solve the problem. Which of them works best for your problem can only be found out through experimentation.

Note

Many books on numerical optimization focus strongly on the inner workings of algorithms.
They will, for example, describe the difference between a trust-region algorithm and a
line-search algorithm in a lot of detail. We have an intuitive explanation of this too. Understanding these details is important for configuring and
troubleshooting optimizations, but not for algorithm selection. For example, If you have
a scalar, differentiable problem without nonlinear constraints, the decision tree
suggests `fides`

and two variants of `lbfgsb`

. `fides`

is a trust-region algorithm,
`lbfgsb`

is a line-search algorithm. Both are designed to solve the same kinds of
problems and which one works best needs to be found out through experimentation.

## An example problem#

As an example we use the Trid function. The Trid function has no local minimum except the global one. It is defined for any number of dimensions, we will pick 20. As starting values we will pick the vector [0, 1, …, 19].

A Python implementation of the function and its gradient looks like this:

```
import warnings
warnings.filterwarnings("ignore")
```

```
import numpy as np
import optimagic as om
def trid_scalar(x):
"""Implement Trid function: https://www.sfu.ca/~ssurjano/trid.html."""
return ((x - 1) ** 2).sum() - (x[1:] * x[:-1]).sum()
def trid_gradient(x):
"""Calculate gradient of trid function."""
l1 = np.insert(x, 0, 0)
l1 = np.delete(l1, [-1])
l2 = np.append(x, 0)
l2 = np.delete(l2, [0])
return 2 * (x - 1) - l1 - l2
```

### Step 1: Theory#

Let’s go through the decision tree for the Trid function:

**No**nonlinear constraints our solution needs to satisfy**No**least-squares structure we can exploit**Yes**, the function is differentiable. We even have a closed form gradient that we would like to use.

We therefore end up with the candidate algorithms `scipy_lbfgsb`

, `nlopt_lbfgsb`

, and
`fides`

.

Note

If your function is differentiable but you do not have a closed form gradient (yet), we suggest to use at least one gradient based optimizer and one gradient free optimizer. in your experiments. Optimagic will use numerical gradients in that case. For details, see here.

### Step 2: Experiments#

To find out which algorithms work well for our problem, we simply run optimizations with all candidate algorithms in a loop and store the result in a dictionary. We limit the number of function evaluations to 8. Since some algorithms only support a maximum number of iterations as stopping criterion we also limit the number of iterations to 8.

```
results = {}
for algo in ["scipy_lbfgsb", "nlopt_lbfgsb", "fides"]:
results[algo] = om.minimize(
fun=trid_scalar,
jac=trid_gradient,
params=np.arange(20),
algorithm=algo,
algo_options={"stopping_maxfun": 8, "stopping_maxiter": 8},
)
fig = om.criterion_plot(results, max_evaluations=8)
fig.show(renderer="png")
```

All optimizers work pretty well here and since this is a very simple problem, any of them
would probably find the optimum in a reasonable time. However, `nlopt_lbfgsb`

is a bit
better than the others, so we will select it for the next step. In more difficult
examples, the difference between optimizers can be much more pronounced.

### Step 3: Optimization#

All that is left to do is to run the optimization until convergence with the best optimizer. To avoid duplicated calculations, we can already start from the previously best parameter vector:

```
best_x = results["nlopt_lbfgsb"].params
results["nlopt_lbfgsb_complete"] = om.minimize(
fun=trid_scalar,
jac=trid_gradient,
params=best_x,
algorithm="nlopt_lbfgsb",
)
```

Looking at the result in a criterion plot we can see that the optimizer converges after a bit more than 30 function evaluations.

```
fig = om.criterion_plot(results)
fig.show(renderer="png")
```

## Variations of the four steps#

The four steps described above work very well in most situations. However, sometimes it makes sense to deviate:

If you are unsure about some of the questions in step 1, select more algorithms for the experimentation phase and run more than 1 algorithm until convergence.

If it is very important to find a precise optimum, run more than 1 algorithm until convergence.

If you have a very fast objective function, simply run all candidate algorithms until convergence.

If you have a differentiable objective function but no closed form derivative, use at least one gradient based optimizer and one gradient free optimizer in the experiments. See here to learn more about derivatives.

## How important was it?#

The Trid function is differentiable and very well behaved in almost every aspect. Moreover, it has a very short runtime. One would think that any optimizer can find its optimum. So let’s compare the selected optimizer with a few others:

```
results = {}
for algo in ["nlopt_lbfgsb", "scipy_neldermead", "scipy_cobyla"]:
results[algo] = om.minimize(
fun=trid_scalar,
jac=trid_gradient,
params=np.arange(20),
algorithm=algo,
)
fig = om.criterion_plot(results)
fig.show(renderer="png")
```

We can see that our chosen optimizer solves the problem with less than 35 function evaluations. At this point, the two gradient-free optimizers have not yet made significant progress. CoByLA gets reasonably close to an optimum after about 4k evaluations. Nelder-Mead gets stuck after 8k evaluations and fails to solve the problem.

This example shows not only that the choice of optimizer is important but that the commonly held belief that gradient free optimizers are generally more robust than gradient based ones is dangerous! The Nelder-Mead algorithm did “converge” and reports success, but did not find the optimum. It did not even get stuck in a local optimum because we know that the Trid function does not have local optima except the global one. It just got stuck somewhere.

```
results["scipy_neldermead"].success
```

```
True
```