Richardson Extrapolation#

In this section we introduce the mathematical machinery of Richardson’s method.


Say you want to compute the value of some function \(g: \mathbb{R}_+ \to \mathbb{R}^{m\times n}, h \mapsto g(h)\) as \(h \to 0\); however, \(\lim_{h\to\infty} g(h)\neq g(0)\). We can approximate the limit by evaluating the function at values close to zero on a computer. The error of our approximation naturally depends on \(g\). In certain cases it is possible to express this error in a specific way, in which case we can improve upon the order of our error using Richardson’s method.


Lets start with an easy case where \(f: \mathbb{R} \to \mathbb{R}\) is the function of interest. Using central differences we can approximate \(f'\) at some point \(x \in \mathbb{R}\) by \(g(h) := \frac{f(x+h) - f(x-h)}{2h}\). Note that \(g(h) \to f'(x)\) as \(h \to 0\) if \(f\) is differentiable at \(x\); however, \(g(0)\) is not defined and hence in particular unequal to \(f'(x)\). To quantify the error of using \(g(h)\) instead of \(f'(x)\) we can rely on Taylor’s Theorem (assuming that \(f\) has a Taylor representation):

\[\begin{split} f(x+h) &= f(x) + f'(x)h + f''(x)\frac{h^2}{2} + f'''(x)\frac{h^3}{6} + \dots\\ f(x-h) &= f(x) - f'(x)h + f''(x)\frac{h^2}{2} - f'''(x)\frac{h^3}{6} - \dots\\[1em] \implies& f(x+h) - f(x-h) = 2hf'(x) + 2\frac{h^3}{6} f'''(x) + 2\frac{h^5}{5!} f^{(5)}(x) + \dots \\ \implies& g(h) \stackrel{def}{=} \frac{f(x+h) - f(x-h)}{2h} = f'(x) + h^2 \frac{f'''(x)}{3!} + h^4 \frac{f^{(5)}(x)}{5!} + \dots \\ \implies& g(h) = f'(x) + \sum_{i=0}^{\infty} a_i h^{2+2i} = f'(x) + \mathcal{O}(h^2) \end{split}\]

where \(\mathcal{O}(\cdot)\) denotes the Landau notation. Richardson’s method can be used to improve the error rate \(\mathcal{O}(h^2)\).

General case#

In general Richardson’s method considers sequences that can be written as:

\[ g(h) = L + \sum_{i=0}^{\infty} a_i h^{\theta +i \phi,} \]

where \(L \in \mathbb{R}\) denotes the limit of interest, \(\theta\) the base order of the approximation and \(\phi\) the exponential step. Allthough Richardson’s method works for general sequences, we are mostly interested in the sequences arising when estimating derivatives.

Example (contd.)#

For standard derivative estimates we have





forward diff.




backward diff.




central diff.




Richardson Extrapolation#

From the above table we see that, in general, central differences have a lower approximation error \(\mathcal{O}(h^2)\) than forward or backward differences \(\mathcal{O}(h)\).

Question: Can we improve upon this further?

Let us evaluate \(g\) at multiple values \(h_0, h_1, h_2, \dots\), where it will turn out to be useful to choose values \(h, h/2, h/4, h/8, \dots\) given some prechosen \(h > 0\). More generally \(\{ h_n \}_n, h_n = h/2^n\) for \(n \in \mathbb{N}\). This allows us to write

\[\begin{split} g(h) &= L + \sum_{i=0}^{\infty} a_i h^{\theta +i \phi}\\ g(h/2) &= L + \sum_{i=0}^{\infty} a_i h^{\theta +i \phi} \frac{1}{2^{\theta +i \phi}}\\ g(h/4) &= L + \sum_{i=0}^{\infty} a_i h^{\theta +i \phi} \frac{1}{4^{\theta +i \phi}}\\ &\vdots \end{split}\]

Now approximate the \(g(h_n)\) by dropping all elements in the infinite sum after \(i=1\) and collect the approximation error using the term \(\eta(h_n)\):

\[\begin{split} g(h) &= \tilde{g}(h) + \eta(h) := L + \sum_{i=0}^{1} a_i h^{\theta +i \phi} \\ g(h/2) &= \tilde{g}(h/2) + \eta(h/2) := L + \sum_{i=0}^{1} a_i h^{\theta +i \phi} \frac{1}{2^{\theta +i \phi}}\\ g(h/4) &= \tilde{g}(h/4) + \eta(h/4) := L + \sum_{i=0}^{1} a_i h^{\theta +i \phi} \frac{1}{4^{\theta +i \phi}}\\ &\vdots \end{split}\]

Notice that we are now able to summarize the equations as

\[\begin{split} \begin{bmatrix} g(h) \\ g(h/2) \\ g(h/4) \end{bmatrix} = \begin{bmatrix} 1 & h^\theta & h^{\theta + \phi} \\ 1 & {h^\theta}/{2^\theta} & {h^{\theta + \phi}}/{(2^{\theta + \phi})} \\ 1 & {h^\theta}/{4^\theta} & {h^{\theta + \phi}}/{(4^{\theta + \phi})} \\ \end{bmatrix} \begin{bmatrix} L \\ a_0 \\ a_1 \end{bmatrix} + \begin{bmatrix} \eta (h)\\ \eta (h/2) \\ \eta (h/4) \end{bmatrix} \end{split}\]

which we write in shorthand notation as

\[\begin{split} (\ast): \,\,\, g = H \begin{bmatrix} L \\ a_0 \\ a_1 \end{bmatrix} + \eta \,. \end{split}\]

From looking at equation (\(\ast\)) we see that an improved estimate of \(L\) can be obtained by projecting \(g\) onto \(H\).


To get a better intuition for (\(\ast\)) consider \(H\) in more detail. For the sake of clarity let \(\theta = \phi = 2\).

\[\begin{split} H = \begin{bmatrix} 1 & h^2 & h^4 \\ 1 & h^2/2^2 & h^4/2^4 \\ 1 & h^2/4^2 & h^4/4^4 \\ \end{bmatrix} = \begin{bmatrix} 1 & h^2 & h^4 \\ 1 & (h/2)^2 & (h/2)^4 \\ 1 & (h/4)^2 & (h/4)^4 \\ \end{bmatrix} \end{split}\]

Hence \(H\) is a design matrix constructed from polynomial terms of degree \(0,2,4,\dots\) (in general: \(0,\theta, \theta + \phi, \theta + 2\phi,\dots\)) evaluated at the observed points \(h, h/2,h/4,h/8, \dots\).

In other words, dependant on the step-size of the derivative (\(h\)), we fit a polynomial model to the derivative estimate and approximate the true derivative using the fitted intercept.

The usual estimate is then given by \(\hat{L} := e_1^T (H^T H)^{-1} H^T g\) which is equal to \(e_1^T H^{-1} g = \sum_{i} \{H^{-1}\}_{1,i} g_i\) in case \(H\) is regular.

Did we improve the error rate?#

Let us first consider the error function \(\eta: h \to \eta (h)\) in more detail. We see that

\[ \eta(h) = g(h) - \tilde{g}(h) = L + \sum_{i=0}^{\infty} a_i h^{\theta +i \phi} - (L + \sum_{i=0}^{1} a_i h^{\theta +i \phi}) = \sum_{i=2}^{\infty} h^{\theta +i \phi} = \mathcal{O}(h^{\theta +2 \phi}) \,. \]

Now consider the case where \(H\) is regular (which happens here when \(H\) is quadratic). We then have, using (\(\ast\))

\[\begin{split} g = H \begin{bmatrix} L \\ a_0 \\ a_1 \end{bmatrix} + \eta \implies H^{-1} g = \begin{bmatrix} L \\ a_0 \\ a_1 \end{bmatrix} + H^{-1} \eta \end{split}\]

To get a better view on the error rate consider our ongoing example again.

Example (contd.)#


\[\begin{split} H = \begin{bmatrix} 1 & h^2 & h^4 \\ 1 & (h/2)^2 & (h/2)^4 \\ 1 & (h/4)^2 & (h/4)^4 \\ \end{bmatrix} \end{split}\]

we get

\[\begin{split} H^{-1} = \frac{1}{45} \begin{bmatrix} 1 & -20 & 64\\ -20/h^2 & 340/h^2 & -320/h^2\\ 64/h^4 & -320/h^4 & 256/h^4 \end{bmatrix} \end{split}\]

Further, since for central differences \(\theta = \phi = 2\) we have \(\eta (h_n) = \mathcal{O}(h^6)\) for all \(n\) and thus:

\[\begin{split} H^{-1} \eta = H^{-1} \begin{bmatrix} \eta(h) \\ \eta (h/2) \\ \eta (h/4) \\ \end{bmatrix} = \begin{bmatrix} \mathcal{O}(h^6) \\ \dots \\ \dots \\ \end{bmatrix} \implies \hat{L} = \{H^{-1} g \}_1 = L + \mathcal{O}(h^6) \end{split}\]

And so indeed we improved the error rate.