How to add optimizers to optimagic#
This is a hands-on guide that shows you how to use custom optimizers with optimagic or how to contribute an optimizer to the optimagic library.
We have many examples of optimizers that are already part of optimagic and you can learn a lot from looking at those. However, only looking at the final results might be a bit intimidating and does not show the process of exploring a new optimizer library and gradually developing a wrapper.
This guide is there to fill the gap. It tells the story of how the pygmo_gaco
optimizer was added to optimagic by someone who was unfamiliar with pygmo or the
gaco algorithm.
The steps of adding an algorithm are roughly as follows:
Understand how to use the algorithm: Play around with the algorithm you want to add in a notebook and solve some simple problems with it. Only move on to the next step after you have a solid understanding. This is completely unrelated to optimagic and only about he algorithm implementation you want to wrap.
Understand how the algorithm works: Read documentation, research papers and other resources to find out why this algorithm was created and what problems it is supposed to solve really well.
Implement the minimal wrapper: Learn about the
om.mark.minimizer
decorator as well as theom.InternalOptimizationProblem
and theom.Algorithm
classes. Implement a minimal version of your wrapper and test it.Complete and refactor the wrapper: Make sure that all convergence criteria, stopping criteria, and tuning parameters the algorithm supports can be passed to your wrapper. Also check that the algorithm gets everything it needs to achieve maximum performance (e.g. closed form derivatives and batch function evaluators). Now is also the time to clean-up and refactor your code, especially if you wrap multiple optimizers from a library.
Align the wrapper with optimagic conventions: Use harmonized names wherever a convention exists. Think about good names everywhere else. Set stopping criteria similar to other optimizers and try to adhere to our design philosophy when it comes to tuning parameters.
Integrate your code into optimagic: Learn how to add an optional dependency to optimagic, where you need to put your code and how to add tests and documentation.
Gen AI Policy#
It is ok to use GenAI and AI based coding assistants to speed up the process of adding
an optimizer to optimagic. They can be very useful for step 1 and 2. However, AI models
often fail completely when filling out the arguments of om.mark.minimizer
, when you
ask them to come up with good names for tuning parameters or when you auto-generate the
documentation.
Even for step 1 and 2 you should not use an AI Model naively, but upload a paper or documentation page to provide context to the AI.
Our policy is therefore:
Only use AI for drafts that you double-check; Never rely on AI producing correct results
Be transparent about your use of AI
We will reject all Pull Requests that violate this policy.
1. Understand how to use the algorithm#
Understanding how to use an algorithm means that you are at least able to solve a simple optimization problem (like a sphere function or a rosenbrock function).
The best starting point for this are usually tutorials or example notebooks from the documentation. An AI model can also be a good idea.
The things you need to find out for any new algorithm are:
How to code up the objective function
How to run an optimization at default values
How to pass tuning parameters
How to pass bounds, constraints, derivatives, batch evaluators, etc.
How to get results back from the optimizer
Objective functions in pygmo#
To add pygmo_gaco, let’s start by looking at the pygmo tutorials. Objective functions are coded up via the Problem class. We skip using pre-defined problems because they will not help us and directly go to user defined problems.
The following is copied from the documentation:
import numpy as np
import pygmo as pg
class sphere_function:
def fitness(self, x):
return [sum(x * x)]
def get_bounds(self):
return ([-1, -1], [1, 1])
prob = pg.problem(sphere_function())
This looks simple enough. No subclassing is required, fitness
implements the objective
function, which returns the objective value as a list of a scalar and get_bounds
returns
the bounds. We can immediately see how we would adjust this for any scalar objective
function.
How to run an optimization at default values#
After copy pasting from a few tutorials we find the following:
# The initial population
pop = pg.population(prob, size=20)
# The algorithm; ker needs to be at most the population size to avoid errors
algo = pg.algorithm(pg.gaco(ker=20))
# The actual optimization process
pop = algo.evolve(pop)
# Getting the best individual in the population
best_fitness = pop.get_f()[pop.best_idx()]
print(best_fitness)
best_x = pop.get_x()[pop.best_idx()]
print(np.round(best_x, 4))
[2.90746791e-05]
[-0.0019 -0.0051]
It looks like the optimization worked, even though the precision is not great. The true optimal function value is 0 and the true optimal parameters are [0, 0]. But global algorithms like gaco are almost never precise, so this is good enough.
We can also see that pygmo is really organized around concepts that are specific to genetic optimizers. Examples are population
and evolve
. The optimagic wrapper will hide the details (i.e. users don’t have to create a population) but still allow full customization (the population size will be an algorithm specific option that can be set by the user).
How to pass tuning parameters#
We already saw in the previous step that tuning parameters like ker
are passed when the
algorithm is created.
All supported tuning parameters of gaco are listed and described here. Unfortunately, the description is not great so we’ll have to look into the paper for details.
How to pass bounds, constraints, derivatives, batch evaluators, etc.#
We already saw how to pass bounds via the Problem class
gaco does not support any other constraints, so we don’t need to pass them
gaco is derivative free, so we don’t need to pass derivatives
gaco can parallelize, so we need to find out how to pass a batch version of the objective function
After searching around in the pygmo documentation, we find out that our Problem needs to
be extended with a batch_fitness
and our algorithm needs to know about pg.bfe()
.
In our previous example it will look like this:
import numpy as np
import pygmo as pg
class sphere_function:
def fitness(self, x):
return [sum(x * x)]
def get_bounds(self):
return ([-1, -1], [1, 1])
# dvs represents a batch of parameter vectors at which the objective function is
# evaluated. However it is stored in an unintuitive format that needs to be reshaped
# to get at the actual parameter vectors.
def batch_fitness(self, dvs):
dim = len(self.get_bounds()[0])
x_list = list(dvs.reshape(-1, dim))
# we don't actually need to parallelize to find out how batch evaluators work
# and optimagic will make it really easy to parallelize this later on.
eval_list = [self.fitness(x)[0] for x in x_list]
evals = np.array(eval_list)
return evals
prob = pg.problem(sphere_function())
pop = pg.population(prob, size=20)
# creating the algorithm now requires 3 steps
pygmo_uda = pg.gaco(ker=20)
pygmo_uda.set_bfe(pg.bfe())
algo = pg.algorithm(pygmo_uda)
pop = algo.evolve(pop)
best_fitness = pop.get_f()[pop.best_idx()]
print(best_fitness)
best_x = pop.get_x()[pop.best_idx()]
print(np.round(best_x, 4))
[1.09429677e-06]
[0.0009 0.0006]
For this how-to guide we leave it at this basic exploration of the pygmo library. If you actually contributed an optimizer to optimagic, you would have to explore much more and document your exploration to convince us that you understand the library you wrap in detail.
How to get results back#
The results are stored as part of the evolved population
print("Best function value: ", pop.get_f()[pop.best_idx()][0])
print("Best parameters: ", pop.get_x()[pop.best_idx()])
print("Number of function evaluations: ", pop.problem.get_fevals())
Best function value: 1.094296768484577e-06
Best parameters: [0.00088044 0.00056491]
Number of function evaluations: 2020
2. Understand how the algorithm works#
Here we want to find out as much as possible about the algorithm. Common questions that should be answered are:
For which kind of problems and situations was it designed?
How does it work (intuitively)?
Are there any papers, blogposts or other sources of information on the algorithm?
Which tuning parameters does it have and what do they mean?
Are there known limitations?
For which kind of problems and situations was it desigend#
gaco is a global optimizer that does not use derivative information. It should not be used if you only need a local optimum or if you have derivatives. Other algorithms would be more efficient and more precise there.
Since gaco can evaluate the objective function in parallel it is designed for problems with expensive objective functions.
How does it work (intuitively)#
Ant colony optimization is a class of optimization algorithms modeled on the actions of an ant colony. Artificial “ants” (e.g. simulation agents) locate optimal solutions by moving through a parameter space representing all possible solutions. Real ants lay down pheromones directing each other to resources while exploring their environment. The simulated “ants” similarly record their positions and the quality of their solutions, so that in later simulation iterations more ants locate better solutions.
The generalized ant colony algorithm generates future generations of ants by using a multi-kernel gaussian distribution based on three parameters (i.e., pheromone values) which are computed depending on the quality of each previous solution. The solutions are ranked through an oracle penalty method.
Are there any papers, blogposts or other sources of information on the algorithm?#
gaco was proposed in M. Schlueter, et al. (2009). Extended ant colony optimization for non-convex mixed integer non-linear programming. Computers & Operations Research.
See here for a free pdf.
Which tuning parameters does it have and what do they mean?#
The following is not just copied from the documentation but extended by reading the paper. It is super important to provide as much information as possible for every tunig parameter:
gen (int): number of generations.
ker (int): number of solutions stored in the solution archive. Must be <= the population size.
q (float): convergence speed parameter. This parameter manages the convergence speed towards the found minima (the smaller the faster). It must be positive and can be larger than 1. The default is 1.0 until threshold is reached. Then it is set to 0.01.
oracle (float): oracle parameter used in the penalty method.
acc (float): accuracy parameter for maintaining a minimum penalty function’s values distances.
threshold (int): when the iteration counter reaches the threshold the convergence speed is set to 0.01 automatically. To deactivate this effect set the threshold to stopping.maxiter which is the largest allowed value.
n_gen_mark (int): parameter that determines the convergence speed of the standard deviations. This must be an integer.
impstop (int): if a positive integer is assigned here, the algorithm will count the runs without improvements, if this number exceeds the given value, the algorithm will be stopped.
evalstop (int): maximum number of function evaluations.
focus (float): this parameter makes the search for the optimum greedier and more focused on local improvements (the higher the greedier). If the value is very high, the search is more focused around the current best solutions. Values larger than 1 are allowed.
memory (bool): if True, memory is activated in the algorithm for multiple calls.
seed (int): seed used by the internal random number generator (default is random).
Are there known limitations#
No.
3. Implement the minimal wrapper#
Learn the relevant functions and classes#
Before you implement a minimal wrapper, you need to familiarize yourself with a few important classes and functions you will need.
The
mark.miminizer
decoratorThe
Algorithm
classThe
InternalOptimizationProblem
classThe
InternalOptimizeResult
class
Your task will be to subclass Algorithm
. Your subclass must be decorated with
mark.minizer
and override Algorithm._solve_internal_problem
. _solve_internal_problem
takes an InternalOptimizationProblem
and returns an InternalOptimizeResult
Note
Users of optimagic never create instances of InternalOptimizationProblem
nor
do they call the _solve_internal_problem
methods of algorithms. Instead they call
minimize
or maximize
which are much more convenient and flexible.
minimize
and maximize
will then create an InternalOptimizationProblem
from the
user’s inputs, call the _solve_internal_problem
method and postprocess it to create an
OptimizeResult.
To summarize: The public minimize
interface is optimized for user-friendliness. The
InternalOptimizeProblem
is optimized for easy wrapping of external libraries.
Below we define a heavily commented minimal version of a wrapper for pygmo’s gaco algorithm. We stay as close as possible to the pygmo examples we have worked with before and ignore most tuning parameters for now.
Write the minimal implementation#
from dataclasses import dataclass
from numpy.typing import NDArray
import optimagic as om
from optimagic.optimization.algorithm import Algorithm, InternalOptimizeResult
from optimagic.optimization.internal_optimization_problem import (
InternalOptimizationProblem,
)
from optimagic.typing import AggregationLevel, PositiveInt
try:
import pygmo as pg
IS_PYGMO_INSTALLED = True
except ImportError:
IS_PYGMO_INSTALLED = False
@om.mark.minimizer(
# you can pick the name; convention is lowercase with underscores
name="pygmo_gaco",
# the type of problem this optimizer can solve -> scalar problems; Other optimizers
# solve likelihood or least_squares problems.
solver_type=AggregationLevel.SCALAR,
# is the optimizer available? -> only if pygmo is installed
is_available=IS_PYGMO_INSTALLED,
# is the optimizer a global optimizer? -> yes
is_global=True,
# does the optimizer need the jacobian? -> no, gaco is derivative free
needs_jac=False,
# does the optimizer need the hessian? -> no, gaco is derivative free
needs_hess=False,
# does the optimizer support parallelism? -> yes
supports_parallelism=True,
# does the optimizer support bounds? -> yes
supports_bounds=True,
# does the optimizer support linear constraints? -> no
supports_linear_constraints=False,
# does the optimizer support nonlinear constraints? -> no
supports_nonlinear_constraints=False,
# should the history be disabled? -> no
disable_history=False,
)
# All algortihms need to be frozen dataclasses.
@dataclass(frozen=True)
class PygmoGaco(Algorithm):
# for now only set one parameter to get things running. The rest will come later.
stopping_maxiter: PositiveInt = 1000
n_cores: int = 1
def _solve_internal_problem(
self, problem: InternalOptimizationProblem, x0: NDArray[np.float64]
) -> InternalOptimizeResult:
# create a pygmo problem from the internal optimization problem
# This is just slightly more abstract than before and actually simpler because
# we have problem.batch_fun.
n_cores = self.n_cores
class PygmoProblem:
def fitness(self, x):
# problem.fun is not just the `fun` that was passed to `minimize` by
# the user. It is a wrapper around fun with added error handling,
# history collection, and reparametrization to enforce constraints.
# Moreover, it always works on flat numpy arrays as parameters and
# does not have additional arguments. The magic of optimagic is to
# create this internal `fun` from the user's `fun`, so you don't have
# to deal with constraints, weird parameter formats and similar when
# implementing the wrapper.
return [problem.fun(x)]
def get_bounds(self):
# problem.bounds is not just the `bounds` that was passed to `minimize`
# by the user, which could have been a dictionary or some other non-flat
# format. `problem.bounds` always contains flat arrays with lower and
# upper bounds because this makes it easy to write wrappers.
return (problem.bounds.lower, problem.bounds.upper)
def batch_fitness(self, dvs):
# The processing of dvs is pygmo specific.
dim = len(self.get_bounds()[0])
x_list = list(dvs.reshape(-1, dim))
# problem.batch_fun is a parallelized version of problem.fun.
eval_list = problem.batch_fun(x_list, n_cores)
evals = np.array(eval_list)
return evals
prob = pg.problem(PygmoProblem())
pop = pg.population(prob, size=20)
pygmo_uda = pg.gaco(ker=20)
pygmo_uda.set_bfe(pg.bfe())
algo = pg.algorithm(pygmo_uda)
pop = algo.evolve(pop)
best_fun = pop.get_f()[pop.best_idx()][0]
best_x = pop.get_x()[pop.best_idx()]
n_fun_evals = pop.problem.get_fevals()
# For now we only use a few fields of the InternalOptimizeResult.
out = InternalOptimizeResult(
x=best_x,
fun=best_fun,
n_fun_evals=n_fun_evals,
)
return out
Test the minimal wrapper directly#
So now that we have a wrapper, what do we do with it? And how can we be sure it works?
We’ll first try it out directly with the SphereExampleInternalOptimizationProblem
.
This is only for debugging and testing purposes. A user would never create an
InternalOptimizationProblem and call an algorithm with it. It’s called “Internal” for
a reason!
from optimagic.optimization.internal_optimization_problem import (
SphereExampleInternalOptimizationProblem,
)
problem = SphereExampleInternalOptimizationProblem()
gaco = PygmoGaco()
result = gaco._solve_internal_problem(problem, x0=np.array([1.0, 1.0]))
print(result.fun)
print(result.x)
print(result.n_fun_evals)
0.00263270999224242
[ 0.01321834 0.01577229 -0.01486191 -0.00759967 -0.00395595 -0.0173868
0.01081493 -0.01195463 -0.02494278 -0.02702999]
2020
Use the minimal wrapper in minimize#
The internal testing gives us some confidence that the wrapper works correctly and would
have been good for debugging if it didn’t. But now we want to test the wrapper in the
way it would be used later: via minimize
With this we also get all the benefits of optimagic, from history collection and criterion plots to flexible parameter formats.
res = om.minimize(
fun=lambda x: x @ x,
params=np.arange(5),
algorithm=PygmoGaco,
bounds=om.Bounds(lower=-np.ones(5), upper=np.ones(5)),
)
om.criterion_plot(res, monotone=True)
4 Complete and refactor the wrapper#
To keep things simple, we left out almost all tuning parameters of the gaco algorithm when we wrote the minimal wrapper.
Now it’s time to add them. You can add them one by one and make sure nothing breaks by testing your wrapper after each change - both with the internal problem and via minimize.
Moreover, our code looks quite messy currently. Despite being a minimal wrapper, the
_solve_internal_problem
method is quite long, unstructured and hard to read.
The result of completing and refactoring the wrapper is too long to be repeated in the notebook. Instead you can look at the actual implementation in optimagic
The PygmoGaco class now contains all tuning parameters we identified in step 2 as dataclass fields. They all have very useful type-hints that don’t just show whether a parameter is an int, str or float but also which values it can take (e.g. PositiveInt).
_solve_internal_problem
is now also much cleaner. It mainly maps our mor descriptive
names of tuning parameters to the old pygmo names and then calls a function called
_minimize_pygmo
that does all the heavy lifting and can be re-used for other pygmo
optimizers.
The arguments to mark.minimizer
have not changed. They always need te be set correctly,
even for minimal working examples.
5. Align the wrapper with optimagic conventions#
To make switching between different algorithm as simple as possible, we align the names of commonly used convergence and stopping criteria. We also align the default values for stopping and convergence criteria as much as possible.
You can find the harmonized names and value here.
To align the names of other tuning parameters as much as possible with what is already there, simple have a look at the optimizers we already wrapped. For example, if you are wrapping a bfgs or lbfgs algorithm from some libray, try to look at all existing wrappers of bfgs algorithms and use the same names for the same options.
You can see what this means for the gaco algorithm here
In the future we will provide much more extensive guidelines for harmonization.
6. Integrate your code into optimagic#
So far you could have worked in a Jupyter Notebook. Integrating your code into optimagic only requires a few small changes:
Add new dependencies to the
environment.yml
file and runpre-commit run --all-files
. This will trigger a script that adds the dependencies to multiple environments we need for continuous integration. Then re-create the enviroment to make sure that the environment is the same as we will use for continuous integration. If your dependencies don’t work on all platforms (e.g. linux only packages), skip this entire step and reach out to a core contributor for help.Save the code for your algorithm wrapper in a .py file in
optimagic.algorithms
. Use an existing file if you wrap another algorithm from a library we already had. Otherwise, create a new file.Run
pre-commit run --all-files
. This will trigger an automatic code generation that fully integrates your wrapper into our algorithm selection tool.Run
pytest
. This will run at least a few tests for your new algorithm. Add more tests for algorithm specific things (e.g. tests that make sure tuning parameters have the intended effects).Write documentation. The documentation should contain everything you figured out in
step 2. You can either write it into the docstring of your algorithm class (preferred, as this is what we will do for all algorithms in the long run) or inalgorithms.md
in the documentation.Create a pull request and ask for a review.