How to add optimizers to optimagic#

This is a hands-on guide that shows you how to use custom optimizers with optimagic or how to contribute an optimizer to the optimagic library.

We have many examples of optimizers that are already part of optimagic and you can learn a lot from looking at those. However, only looking at the final results might be a bit intimidating and does not show the process of exploring a new optimizer library and gradually developing a wrapper.

This guide is there to fill the gap. It tells the story of how the pygmo_gaco optimizer was added to optimagic by someone who was unfamiliar with pygmo or the gaco algorithm.

The steps of adding an algorithm are roughly as follows:

  1. Understand how to use the algorithm: Play around with the algorithm you want to add in a notebook and solve some simple problems with it. Only move on to the next step after you have a solid understanding. This is completely unrelated to optimagic and only about he algorithm implementation you want to wrap.

  2. Understand how the algorithm works: Read documentation, research papers and other resources to find out why this algorithm was created and what problems it is supposed to solve really well.

  3. Implement the minimal wrapper: Learn about the om.mark.minimizer decorator as well as the om.InternalOptimizationProblem and the om.Algorithm classes. Implement a minimal version of your wrapper and test it.

  4. Complete and refactor the wrapper: Make sure that all convergence criteria, stopping criteria, and tuning parameters the algorithm supports can be passed to your wrapper. Also check that the algorithm gets everything it needs to achieve maximum performance (e.g. closed form derivatives and batch function evaluators). Now is also the time to clean-up and refactor your code, especially if you wrap multiple optimizers from a library.

  5. Align the wrapper with optimagic conventions: Use harmonized names wherever a convention exists. Think about good names everywhere else. Set stopping criteria similar to other optimizers and try to adhere to our design philosophy when it comes to tuning parameters.

  6. Integrate your code into optimagic: Learn how to add an optional dependency to optimagic, where you need to put your code and how to add tests and documentation.

Gen AI Policy#

It is ok to use GenAI and AI based coding assistants to speed up the process of adding an optimizer to optimagic. They can be very useful for step 1 and 2. However, AI models often fail completely when filling out the arguments of om.mark.minimizer, when you ask them to come up with good names for tuning parameters or when you auto-generate the documentation.

Even for step 1 and 2 you should not use an AI Model naively, but upload a paper or documentation page to provide context to the AI.

Our policy is therefore:

  1. Only use AI for drafts that you double-check; Never rely on AI producing correct results

  2. Be transparent about your use of AI

We will reject all Pull Requests that violate this policy.

1. Understand how to use the algorithm#

Understanding how to use an algorithm means that you are at least able to solve a simple optimization problem (like a sphere function or a rosenbrock function).

The best starting point for this are usually tutorials or example notebooks from the documentation. An AI model can also be a good idea.

The things you need to find out for any new algorithm are:

  1. How to code up the objective function

  2. How to run an optimization at default values

  3. How to pass tuning parameters

  4. How to pass bounds, constraints, derivatives, batch evaluators, etc.

  5. How to get results back from the optimizer

Objective functions in pygmo#

To add pygmo_gaco, let’s start by looking at the pygmo tutorials. Objective functions are coded up via the Problem class. We skip using pre-defined problems because they will not help us and directly go to user defined problems.

The following is copied from the documentation:

import numpy as np
import pygmo as pg


class sphere_function:
    def fitness(self, x):
        return [sum(x * x)]

    def get_bounds(self):
        return ([-1, -1], [1, 1])


prob = pg.problem(sphere_function())

This looks simple enough. No subclassing is required, fitness implements the objective function, which returns the objective value as a list of a scalar and get_bounds returns the bounds. We can immediately see how we would adjust this for any scalar objective function.

How to run an optimization at default values#

After copy pasting from a few tutorials we find the following:

# The initial population
pop = pg.population(prob, size=20)
# The algorithm; ker needs to be at most the population size to avoid errors
algo = pg.algorithm(pg.gaco(ker=20))
# The actual optimization process
pop = algo.evolve(pop)
# Getting the best individual in the population
best_fitness = pop.get_f()[pop.best_idx()]
print(best_fitness)
best_x = pop.get_x()[pop.best_idx()]
print(np.round(best_x, 4))
[2.90746791e-05]
[-0.0019 -0.0051]

It looks like the optimization worked, even though the precision is not great. The true optimal function value is 0 and the true optimal parameters are [0, 0]. But global algorithms like gaco are almost never precise, so this is good enough.

We can also see that pygmo is really organized around concepts that are specific to genetic optimizers. Examples are population and evolve. The optimagic wrapper will hide the details (i.e. users don’t have to create a population) but still allow full customization (the population size will be an algorithm specific option that can be set by the user).

How to pass tuning parameters#

We already saw in the previous step that tuning parameters like ker are passed when the algorithm is created.

All supported tuning parameters of gaco are listed and described here. Unfortunately, the description is not great so we’ll have to look into the paper for details.

How to pass bounds, constraints, derivatives, batch evaluators, etc.#

  • We already saw how to pass bounds via the Problem class

  • gaco does not support any other constraints, so we don’t need to pass them

  • gaco is derivative free, so we don’t need to pass derivatives

  • gaco can parallelize, so we need to find out how to pass a batch version of the objective function

After searching around in the pygmo documentation, we find out that our Problem needs to be extended with a batch_fitness and our algorithm needs to know about pg.bfe(). In our previous example it will look like this:

import numpy as np
import pygmo as pg


class sphere_function:
    def fitness(self, x):
        return [sum(x * x)]

    def get_bounds(self):
        return ([-1, -1], [1, 1])

    # dvs represents a batch of parameter vectors at which the objective function is
    # evaluated. However it is stored in an unintuitive format that needs to be reshaped
    # to get at the actual parameter vectors.
    def batch_fitness(self, dvs):
        dim = len(self.get_bounds()[0])
        x_list = list(dvs.reshape(-1, dim))
        # we don't actually need to parallelize to find out how batch evaluators work
        # and optimagic will make it really easy to parallelize this later on.
        eval_list = [self.fitness(x)[0] for x in x_list]
        evals = np.array(eval_list)
        return evals


prob = pg.problem(sphere_function())

pop = pg.population(prob, size=20)

# creating the algorithm now requires 3 steps
pygmo_uda = pg.gaco(ker=20)
pygmo_uda.set_bfe(pg.bfe())
algo = pg.algorithm(pygmo_uda)

pop = algo.evolve(pop)
best_fitness = pop.get_f()[pop.best_idx()]
print(best_fitness)
best_x = pop.get_x()[pop.best_idx()]
print(np.round(best_x, 4))
[1.09429677e-06]
[0.0009 0.0006]

For this how-to guide we leave it at this basic exploration of the pygmo library. If you actually contributed an optimizer to optimagic, you would have to explore much more and document your exploration to convince us that you understand the library you wrap in detail.

How to get results back#

The results are stored as part of the evolved population

print("Best function value: ", pop.get_f()[pop.best_idx()][0])
print("Best parameters: ", pop.get_x()[pop.best_idx()])
print("Number of function evaluations: ", pop.problem.get_fevals())
Best function value:  1.094296768484577e-06
Best parameters:  [0.00088044 0.00056491]
Number of function evaluations:  2020

2. Understand how the algorithm works#

Here we want to find out as much as possible about the algorithm. Common questions that should be answered are:

  • For which kind of problems and situations was it designed?

  • How does it work (intuitively)?

  • Are there any papers, blogposts or other sources of information on the algorithm?

  • Which tuning parameters does it have and what do they mean?

  • Are there known limitations?

For which kind of problems and situations was it desigend#

gaco is a global optimizer that does not use derivative information. It should not be used if you only need a local optimum or if you have derivatives. Other algorithms would be more efficient and more precise there.

Since gaco can evaluate the objective function in parallel it is designed for problems with expensive objective functions.

How does it work (intuitively)#

Ant colony optimization is a class of optimization algorithms modeled on the actions of an ant colony. Artificial “ants” (e.g. simulation agents) locate optimal solutions by moving through a parameter space representing all possible solutions. Real ants lay down pheromones directing each other to resources while exploring their environment. The simulated “ants” similarly record their positions and the quality of their solutions, so that in later simulation iterations more ants locate better solutions.

The generalized ant colony algorithm generates future generations of ants by using a multi-kernel gaussian distribution based on three parameters (i.e., pheromone values) which are computed depending on the quality of each previous solution. The solutions are ranked through an oracle penalty method.

Are there any papers, blogposts or other sources of information on the algorithm?#

gaco was proposed in M. Schlueter, et al. (2009). Extended ant colony optimization for non-convex mixed integer non-linear programming. Computers & Operations Research.

See here for a free pdf.

Which tuning parameters does it have and what do they mean?#

The following is not just copied from the documentation but extended by reading the paper. It is super important to provide as much information as possible for every tunig parameter:

  • gen (int): number of generations.

  • ker (int): number of solutions stored in the solution archive. Must be <= the population size.

  • q (float): convergence speed parameter. This parameter manages the convergence speed towards the found minima (the smaller the faster). It must be positive and can be larger than 1. The default is 1.0 until threshold is reached. Then it is set to 0.01.

  • oracle (float): oracle parameter used in the penalty method.

  • acc (float): accuracy parameter for maintaining a minimum penalty function’s values distances.

  • threshold (int): when the iteration counter reaches the threshold the convergence speed is set to 0.01 automatically. To deactivate this effect set the threshold to stopping.maxiter which is the largest allowed value.

  • n_gen_mark (int): parameter that determines the convergence speed of the standard deviations. This must be an integer.

  • impstop (int): if a positive integer is assigned here, the algorithm will count the runs without improvements, if this number exceeds the given value, the algorithm will be stopped.

  • evalstop (int): maximum number of function evaluations.

  • focus (float): this parameter makes the search for the optimum greedier and more focused on local improvements (the higher the greedier). If the value is very high, the search is more focused around the current best solutions. Values larger than 1 are allowed.

  • memory (bool): if True, memory is activated in the algorithm for multiple calls.

  • seed (int): seed used by the internal random number generator (default is random).

Are there known limitations#

No.

3. Implement the minimal wrapper#

Learn the relevant functions and classes#

Before you implement a minimal wrapper, you need to familiarize yourself with a few important classes and functions you will need.

  • The mark.miminizer decorator

  • The Algorithm class

  • The InternalOptimizationProblem class

  • The InternalOptimizeResult class

Your task will be to subclass Algorithm. Your subclass must be decorated with mark.minizer and override Algorithm._solve_internal_problem. _solve_internal_problem takes an InternalOptimizationProblem and returns an InternalOptimizeResult

Note

Users of optimagic never create instances of InternalOptimizationProblem nor do they call the _solve_internal_problem methods of algorithms. Instead they call minimize or maximize which are much more convenient and flexible.

minimize and maximize will then create an InternalOptimizationProblem from the user’s inputs, call the _solve_internal_problem method and postprocess it to create an OptimizeResult.

To summarize: The public minimize interface is optimized for user-friendliness. The InternalOptimizeProblem is optimized for easy wrapping of external libraries.

Below we define a heavily commented minimal version of a wrapper for pygmo’s gaco algorithm. We stay as close as possible to the pygmo examples we have worked with before and ignore most tuning parameters for now.

Write the minimal implementation#

from dataclasses import dataclass

from numpy.typing import NDArray

import optimagic as om
from optimagic.optimization.algorithm import Algorithm, InternalOptimizeResult
from optimagic.optimization.internal_optimization_problem import (
    InternalOptimizationProblem,
)
from optimagic.typing import AggregationLevel, PositiveInt

try:
    import pygmo as pg

    IS_PYGMO_INSTALLED = True
except ImportError:
    IS_PYGMO_INSTALLED = False


@om.mark.minimizer(
    # you can pick the name; convention is lowercase with underscores
    name="pygmo_gaco",
    # the type of problem this optimizer can solve -> scalar problems; Other optimizers
    # solve likelihood or least_squares problems.
    solver_type=AggregationLevel.SCALAR,
    # is the optimizer available? -> only if pygmo is installed
    is_available=IS_PYGMO_INSTALLED,
    # is the optimizer a global optimizer? -> yes
    is_global=True,
    # does the optimizer need the jacobian? -> no, gaco is derivative free
    needs_jac=False,
    # does the optimizer need the hessian? -> no, gaco is derivative free
    needs_hess=False,
    # does the optimizer support parallelism? -> yes
    supports_parallelism=True,
    # does the optimizer support bounds? -> yes
    supports_bounds=True,
    # does the optimizer support linear constraints? -> no
    supports_linear_constraints=False,
    # does the optimizer support nonlinear constraints? -> no
    supports_nonlinear_constraints=False,
    # should the history be disabled? -> no
    disable_history=False,
)
# All algortihms need to be frozen dataclasses.
@dataclass(frozen=True)
class PygmoGaco(Algorithm):
    # for now only set one parameter to get things running. The rest will come later.
    stopping_maxiter: PositiveInt = 1000
    n_cores: int = 1

    def _solve_internal_problem(
        self, problem: InternalOptimizationProblem, x0: NDArray[np.float64]
    ) -> InternalOptimizeResult:
        # create a pygmo problem from the internal optimization problem
        # This is just slightly more abstract than before and actually simpler because
        # we have problem.batch_fun.

        n_cores = self.n_cores

        class PygmoProblem:
            def fitness(self, x):
                # problem.fun is not just the `fun` that was passed to `minimize` by
                # the user. It is a wrapper around fun with added error handling,
                # history collection, and reparametrization to enforce constraints.
                # Moreover, it always works on flat numpy arrays as parameters and
                # does not have additional arguments. The magic of optimagic is to
                # create this internal `fun` from the user's `fun`, so you don't have
                # to deal with constraints, weird parameter formats and similar when
                # implementing the wrapper.
                return [problem.fun(x)]

            def get_bounds(self):
                # problem.bounds is not just the `bounds` that was passed to `minimize`
                # by the user, which could have been a dictionary or some other non-flat
                # format. `problem.bounds` always contains flat arrays with lower and
                # upper bounds because this makes it easy to write wrappers.
                return (problem.bounds.lower, problem.bounds.upper)

            def batch_fitness(self, dvs):
                # The processing of dvs is pygmo specific.
                dim = len(self.get_bounds()[0])
                x_list = list(dvs.reshape(-1, dim))
                # problem.batch_fun is a parallelized version of problem.fun.
                eval_list = problem.batch_fun(x_list, n_cores)
                evals = np.array(eval_list)
                return evals

        prob = pg.problem(PygmoProblem())
        pop = pg.population(prob, size=20)
        pygmo_uda = pg.gaco(ker=20)
        pygmo_uda.set_bfe(pg.bfe())
        algo = pg.algorithm(pygmo_uda)
        pop = algo.evolve(pop)
        best_fun = pop.get_f()[pop.best_idx()][0]
        best_x = pop.get_x()[pop.best_idx()]
        n_fun_evals = pop.problem.get_fevals()
        # For now we only use a few fields of the InternalOptimizeResult.
        out = InternalOptimizeResult(
            x=best_x,
            fun=best_fun,
            n_fun_evals=n_fun_evals,
        )
        return out

Test the minimal wrapper directly#

So now that we have a wrapper, what do we do with it? And how can we be sure it works?

We’ll first try it out directly with the SphereExampleInternalOptimizationProblem. This is only for debugging and testing purposes. A user would never create an InternalOptimizationProblem and call an algorithm with it. It’s called “Internal” for a reason!

from optimagic.optimization.internal_optimization_problem import (
    SphereExampleInternalOptimizationProblem,
)

problem = SphereExampleInternalOptimizationProblem()

gaco = PygmoGaco()

result = gaco._solve_internal_problem(problem, x0=np.array([1.0, 1.0]))

print(result.fun)
print(result.x)
print(result.n_fun_evals)
0.00263270999224242
[ 0.01321834  0.01577229 -0.01486191 -0.00759967 -0.00395595 -0.0173868
  0.01081493 -0.01195463 -0.02494278 -0.02702999]
2020

Use the minimal wrapper in minimize#

The internal testing gives us some confidence that the wrapper works correctly and would have been good for debugging if it didn’t. But now we want to test the wrapper in the way it would be used later: via minimize

With this we also get all the benefits of optimagic, from history collection and criterion plots to flexible parameter formats.

res = om.minimize(
    fun=lambda x: x @ x,
    params=np.arange(5),
    algorithm=PygmoGaco,
    bounds=om.Bounds(lower=-np.ones(5), upper=np.ones(5)),
)

om.criterion_plot(res, monotone=True)

4 Complete and refactor the wrapper#

To keep things simple, we left out almost all tuning parameters of the gaco algorithm when we wrote the minimal wrapper.

Now it’s time to add them. You can add them one by one and make sure nothing breaks by testing your wrapper after each change - both with the internal problem and via minimize.

Moreover, our code looks quite messy currently. Despite being a minimal wrapper, the _solve_internal_problem method is quite long, unstructured and hard to read.

The result of completing and refactoring the wrapper is too long to be repeated in the notebook. Instead you can look at the actual implementation in optimagic

The PygmoGaco class now contains all tuning parameters we identified in step 2 as dataclass fields. They all have very useful type-hints that don’t just show whether a parameter is an int, str or float but also which values it can take (e.g. PositiveInt).

_solve_internal_problem is now also much cleaner. It mainly maps our mor descriptive names of tuning parameters to the old pygmo names and then calls a function called _minimize_pygmo that does all the heavy lifting and can be re-used for other pygmo optimizers.

The arguments to mark.minimizer have not changed. They always need te be set correctly, even for minimal working examples.

5. Align the wrapper with optimagic conventions#

To make switching between different algorithm as simple as possible, we align the names of commonly used convergence and stopping criteria. We also align the default values for stopping and convergence criteria as much as possible.

You can find the harmonized names and value here.

To align the names of other tuning parameters as much as possible with what is already there, simple have a look at the optimizers we already wrapped. For example, if you are wrapping a bfgs or lbfgs algorithm from some libray, try to look at all existing wrappers of bfgs algorithms and use the same names for the same options.

You can see what this means for the gaco algorithm here

In the future we will provide much more extensive guidelines for harmonization.

6. Integrate your code into optimagic#

So far you could have worked in a Jupyter Notebook. Integrating your code into optimagic only requires a few small changes:

  1. Add new dependencies to the environment.yml file and run pre-commit run --all-files. This will trigger a script that adds the dependencies to multiple environments we need for continuous integration. Then re-create the enviroment to make sure that the environment is the same as we will use for continuous integration. If your dependencies don’t work on all platforms (e.g. linux only packages), skip this entire step and reach out to a core contributor for help.

  2. Save the code for your algorithm wrapper in a .py file in optimagic.algorithms. Use an existing file if you wrap another algorithm from a library we already had. Otherwise, create a new file.

  3. Run pre-commit run --all-files. This will trigger an automatic code generation that fully integrates your wrapper into our algorithm selection tool.

  4. Run pytest. This will run at least a few tests for your new algorithm. Add more tests for algorithm specific things (e.g. tests that make sure tuning parameters have the intended effects).

  5. Write documentation. The documentation should contain everything you figured out in
    step 2. You can either write it into the docstring of your algorithm class (preferred, as this is what we will do for all algorithms in the long run) or in algorithms.md in the documentation.

  6. Create a pull request and ask for a review.