Scalar Root Finding

Introduction

One of the oldest and simplest problems in numerical methods is to find the solution (root) of a scalar equation:

f(x) = 0

The ubiquity of calculators and mathematical software makes it easy to take the most common cases of this for granted, for example computing square roots.

Square root
The Babylonian tablet YBC 7289

Methods for computing square roots numerically have been around for at least 4,000 years, as shown by the Babylonian tablet YBC 7289. The Babylonians used a base-60 number system, and their approximation for the square root of two was

\sqrt{2} \approx 1 + \frac{24}{60} + \frac{51}{3600} + \frac{10}{216000} = 1.41421\overline{296}

which is the optimal approximation using four base-60 numerals.

The tablet unfortunately does not contain a complete description of a square root method; the best guess for what they were doing is some variant of this iterative method for calculating \sqrt{Q}:

x_{n+1} = \frac{1}{2} \left ( x_n + Q/x_n \right )

which is sometimes called the Babylonian method. For the square root of 2, if we start from x_0 = 1, we obtain a sequence of approximations:

x_1 = (1 + 2) = 3/2 \approx 1.5
x_2 = (3/2 + 4/3) = 17/12 \approx 1.4166666666666667
x_3 = (17/12 + 24/17) = 577/408 \approx 1.4142156862745099
x_4 = (577/408 + 816/577) = \frac{665857}{470832} \approx 1.4142135623746899
x_5 = (\frac{665857}{470832} + \frac{941664}{665857}) = \frac{886731088897}{627013566048} \approx 1.4142135623730951

and already x_5 is correct to 16 digits.

In this section we will examine a variety of methods for finding roots, and their pros and cons.

Later on we will return to this problem in the multivariate setting.

The Bisection Method

The bisection method is very simple, but has some advantages over other methods. To use it, we need to know a change in sign of the function f(x) for which we are trying to find a root. So suppose we know that f(a) and f(b) have opposite signs (i.e. f(a)f(b) < 0). Let c = (a+b)/2, the midpoint of a and b. If the sign of f(c) is the same as the sign of f(a), then replace a by c, otherwise replace b by c. We repeat this process until we are close enough (this could mean that the interval (a,b) is sufficiently small, or that f(a) and f(b) are sufficiently small).


Example (Bisection method)

Suppose we use the bisection method to approximate the square root of 2, starting with the fact that it is between 1 and 2. So we can use f(x) = x^2 - 2, and a=1, b=2. Several steps of bisection would be:

1) c = (a+b)/2 = 3/2, and f(c) = 1/4 > 0, so replace b by c

2) c = (a+b)/2 = 5/4, and f(c) = -7/16 < 0, so replace a by c

3) c = (a+b)/2 = 11/8, and f(c) = -7/64 < 0, so replace a by c

4) c = (a+b)/2 = 23/16 \approx 1.4375

After these four steps our approximation has a relative error of about 1.6 percent.


The bisection method will relentlessly converge to a solution at a fixed rate as long as the function f(x) is continuous on the interval (a,b). To compare it to other methods, which can be faster under some circumstances, we will use the idea of the "order of convergence" of a method.

Error Analysis: Order of Convergence

A method has order q if

\lim_{k \rightarrow \infty} \frac{|a_{k+1} - r|}{|a_{k} - r|^q} < M

for some M, where the a_k are the successive approximations to a root r (i.e. f(r) = 0). Usually if we say a method is of order q we are referring to the largest q for which the inequality holds.

Order of convergence of the bisection method

Suppose we begin the bisection method with a_0 = a, a_1 = b, and define d_1 = |b - a|. At the kth iterate, the size of the interval will be d_k = 2^{-k+1}, and |a_k - r| < d_{k}. We also know that |a_{k+1} - r| \le |a_k - r| since the intervals at each step are contained in the previous one. So the absolute error is bounded by a quantity that is divided by two at each step, and

\frac{|a_{k+1} - r|}{|a_{k} - r|} \le 1

which means the order q is at least 1. If we examine a simple case such as f = x - 1/3 it is easy to see that q is not larger than one, since \lim_{k \rightarrow \infty} \frac{|a_{k+1} - r|}{|a_{k} - r|} = \frac{1}{2}.

Secant/regula falsi method

As with bisection, we start with two points a and b. If these bracket a root (i.e. if f(a)f(b)<0) then we choose to update a or b to preserve this bracketing property - in this case the method is called the Regula Falsi, or Method of False Position. If we update simply by alternating between a and b, and do not necessarily start with a bracketed root, then the method is called the Secant method. But the updated point has the same formula in both cases.

We derive the formula by assuming that we want to use the secant line through the points (a,f(a)) and (b,f(b)) to estimate the function. In point-slope form, we can write this line as

y = \frac{f(b) - f(a)}{b - a}(x - a) + f(a)

Setting y equal to 0, and x to a new estimate c for the root, we can solve for c to get

c = -f(a) \frac{b - a}{f(b) - f(a)} + a

which we can simplify to the more symmetric form:

c = \frac{a f(b) - b f(a)}{f(b) - f(a)} = a \frac{f(b)}{f(b) - f(a)} + b \frac{-f(a)}{f(b) - f(a)}

The second form of this formula shows that the new point c is a weighted convex combination of values a and b, with the weighting determined by the relative magnitude of f(x) at the two points. This results in a faster convergence to the root once f is approximately linear in the interval [a,b]. Unfortunately, it can be slower to converge if f is far from linear in that interval.

Order of convergence of the Secant method

The order of convergence of the Secant Method is a little tricky to compute, but the result is (at least to me) surprising and beautiful. Lets denote the sequence of approximations as

x_{i-1}, x_{i}, x_{i+1}, \ldots

and write them in terms of the root r (i.e. f(r)=0) as x_i = r + e_i so we have a sequence of signed errors e_i. Our goal is to write the magnitude of the (i+1)th error, |e_{i+1}|, in terms of the previous errors' magnitudes. We will make a couple of assumptions to make this calculation easier: we assume that f has a convergent power series around the root r, and that f'(r) and f''(r) are nonzero. The assumption that f'(r) is not zero is especially important - the result we will derive is not true if f'(r)=0. In general, such degenerate roots are much harder to accurately compute.

First we compute e_{i+1} = x_{i+1} - r from the secant formula:

e_{i+1} = \frac{x_{i-1}f(x_i) - x_i f(x_{i-1})}{f(x_{i}) - f(x_{i-1})} - r

and replace x_i = e_i + r and x_{i-1} = e_{i-1} + r, and expand the function f at each point to second order using its power series. So for example we will use the approximation

f(x_i) \approx f(r) + f'(r)(x_i-r) + f''(r) (x_i-r)^2/2
= f'(r)e_i + f''(r)e_i^2/2

and we get

e_{i+1} \approx \frac{(e_{i-1} + r)[f'(r)e_i + f''(r)e_i^2] - (e_i + r)[f'(r)e_{i-1} + f''(r)e_{i-1}^2]}{f'(r)(e_i-e_{i-1}) + f''(r)(e_i^2 - e_{i-1}^2)/2} - r

This looks kind of scary perhaps but it simplifies in a number of ways. In the fraction, every term has a (e_i - e_{i+1}) factor that we can cancel out. We can also simplify this by introducing the quantity Q = 2f'(r)/f''(r) which by assumption is nonzero. Then we have

e_{i+1} \approx \frac{Q r + e_i e_{i-1} + r(e_i + e_{i-1})}{Q + (e_i + e_{i-1})} - r
= \frac{e_i e_{i-1}}{Q + e_i + e_{i-1}}

Since Q is fixed and nonzero, the denominator will be eventually well approximated by Q, and we have

e_{i+1} \approx e_i e_{i-1}/Q

If the method has order \alpha, then |e_{i+1}| \approx k |e_i|^{\alpha}. Substituting this in, we get

k |e_i|^{\alpha} \approx |e_i | |e_{i-1}|/ |Q|

Lets denote 1/|Q by C, and then if we substitute again we obtain

k^{\alpha+1} |e_{i-1}|^{\alpha^2} \approx k C |e_{i-1}|^{\alpha + 1}

For this to be true we need k^{\alpha} = C and the order of convergence is

\alpha = \frac{1 + \sqrt{5}}{2} \approx 1.618

since that is the root of \alpha^2 - \alpha - 1 that is larger than 1. This number is famous - it is often called "the golden ratio". Since it is larger than 1, the Secant Method will eventually converge much faster than the bisection method, although in practice it may take quite a few iterations before it is close enough for this to happen.

Newton's Method

The idea of Newton's method is the use the tangent-line approximation to the function f(x) to estimate the root at each step. This works extremely well if the function f is differentiable and has a nearby simple root. The formula at each step is:

x_{n+1} = x_{n} - \frac{f(x_n)}{f'(x_n)}

Let us compute the order of convergence of Newton's method in the simplest case, in which we assume that we are converging to a simple root r of f(x), so that f'(r) \neq 0.

Now we expand the approximation a_{k+1} around r:

a_{k+1} = a_k - \frac{f(a_k)}{f'(a_k)} \approx a_k - \frac{f'(r)(a_k-r) + f''(r)(a_k-r)^2/2}{f'(r) + f''(r)(a_k-r)}
= a_k - \frac{a_k-r + f''(r)(a_k-r)^2/(2 f'(r))}{1 + f''(r)(a_k-r)/f'(r)}
\approx a_k - \left (a_k-r + f''(r)(a_k-r)^2/(2 f'(r)) \right ) \left ( 1 - f''(r)(a_k-r)/f'(r) \right )
\approx r + f''(r)(a_k-r)^2/(2 f'(r))

So the error at the k+1 step is approximately f''(r)(a_k-r)^2/(2 f'(r)), while the previous error is a_k - r, and the order is q=2. This quadratic convergence makes Newton's method much more efficient than the bisection or secant methods as long as the root is simple, and as long as we start close enough to the root.

In the screen below, you can click to set a starting point for Newton's Method.


Example

Lets find a root of the function f(x) = x^5 - x^3 + 2 x - 1 using Newton's method. The derivative of f is 5 x^{4} - 3 x^{2} + 2, so at each step we will update our approximation x_n to

x_{n+1} = x_n - \frac{x_n^5 - x_n^3 + 2 x_n - 1}{5 x_n^{4} - 3 x_n^{2} + 2} = \frac{4 x_n^{5} - 2 x_n^{3} + 1}{5 x_n^{4} - 3 x_n^{2} + 2}

If we start at x_0 = 0, we obtain the following sequence of approximations

x_1 = \frac{1}{2} \approx 0.500000000000000
x_2 = \frac{14}{25} \approx 0.560000000000000
x_3 = \frac{8486921}{15145750} \approx 0.560349999174686
x_4 = \frac{346329259168234949703667797348544702}{618058826505985730776709611449201875} \approx 0.5603499931003426

After this point the exact fraction representation becomes a little ridiculous. In a standard floating-point (inexact) representation, the next approximation is unchanged from x_4. We can check that

f(x_4) \approx -0.00000000000000015959455978986625 = -1.5959455978986625 \cdot 10^{-16}

is close to zero. In the next chapter we will consider some of the issues with inexact representations of numbers.


Multiple (degenerate) roots

In deriving the order of convergence for Newton's Method and the Secant Method we assumed that f'(r) \neq 0 at the root r. If this is not the case, then we have a multiple (also called degenerate) root. Such roots are intrinsically more difficult to compute, and most methods will have a lower order of convergence for them. If there is a still a change of sign around the root, then the bisection method may be the best option, although even in that case there may be issues with precision.

Let us examine what happens with Newton's Method if the function has an expansion f(x) = f^{(m)}(r)(x-r)^m/m!+ higher order terms where the mth derivative f^{(m)}(r) \neq 0. Then

a_{k+1} = a_k - \frac{f(a_k)}{f'(a_k)} \approx a_k - \frac{f^{(m)}(r)(a_k-r)^m/m!}{f^{(m)}(r)(a_k-r)^{m-1}/(m-1)!} = a_k - \frac{a_k - r}{m}

and the errors e_i = a_k - r are related approximately by

e_{i+1} = \frac{m-1}{m} e_i

which means the method has order of convergence 1, and the linear rate of convergence gets worse as m increases.

Inverse Quadratic and Brent's Method

Both the secant method and Newton's method use a linear approximation to f to estimate the root location. It is natural to try to extend this to use a nonlinear approximation to f; probably the simplest choice is to use a quadratic approximation. This runs into a problem, however, since the interpolating quadratic y = a_0 + a_1 x + a_2 x^2 to the three points (x_i, f(x_i)), (x_{i+1}, f(x_{i+1})), (x_{i+2}, f(x_{i+2})) may not have a real root at all, and this is more of a robust issue than the analogous issue with linear approximations (which only lack a real root if they are horizontal).

The quadratic approximation does work if instead we use a sideways parabola x = a_0 + a_1 y + a_2 y^2, which will generically have a unique real root. We can think of this as a quadratic approximation to the inverse of f, which is why it is called the inverse quadratic method.

Computing the recursion formula for the inverse quadratic method involves a lot of basic algebra, after which we obtain

x_{i+3} = \frac{f_{i+1} f_{i+2}}{(f_i - f_{i+1})(f_i - f_{i+2})} x_i + \frac{f_{i} f_{i+2}}{(f_{i+1} - f_{i})(f_{i+1} - f_{i+2})} x_{i+1} + \frac{f_{i} f_{i+1}}{(f_{i+2} - f_{i})(f_{i+2} - f_{i+1})} x_{i+2}

where for brevity we denote f_i = f(x_i), f_{i+1} = f(x_{i+1}), and f_{i+2} = f(x_{i+2}). To start this process we need three points.

The inverse quadratic can be less efficient than the simpler methods we have looked at, so it is not often used by itself. It is used as a component of the hybrid Brent's Method, variants of which are common in computer algebra systems.

Ridder's Method

An interesting root-finding method that avoids the use of the derivative was suggested by Ridder in 1979. It assumes an exponential factor is present in the function f(x) whose roots we are seeking. Like the Regula Falsi method, it preserves a bracketing root, but will converge faster for smooth functions.

Assume a and b bracket a root of f(x). Denote the midpoint of a and b by m = (a+b)/2. If s is the sign of f(a) (either 1 or -1), then the new approximation point will be

c = m + s(m-a)\frac{f(m)}{\sqrt{f(m)^2 - f(a)f(b)}}

Then as in the Regula Falsi method, c will replace either a or b, for whichever f has the same sign.

Wilkinson's Example

A famous example of the care needed to approximate roots is by James H. Wilkinson, who considered the seemingly simple example of

P_W = \prod_{i=0}^{20} (x-i) = x (x-1) (x-2) \ldots (x-20)
Wilkinson polynomial plot
Graph of the Wilkinson polynomial over the interval containing its roots.

If we are working from the expanded form of the polynomial, and didn't know its roots, the polynomial is considerably more intimidating:

P_W = x^{21} - 210 \, x^{20} + 20615 \, x^{19} - 1256850 \, x^{18} + 53327946 \, x^{17} - 1672280820 \, x^{16} + 40171771630 \, x^{15}
- 756111184500 \, x^{14} + 11310276995381 \, x^{13} - 135585182899530 \, x^{12} + 1307535010540395 \, x^{11}
- 10142299865511450 \, x^{10} + 63030812099294896 \, x^{9} - 311333643161390640 \, x^{8}
+ 1206647803780373360 \, x^{7} - 3599979517947607200 \, x^{6} + 8037811822645051776 \, x^{5}
- 12870931245150988800 \, x^{4} + 13803759753640704000 \, x^{3}
- 8752948036761600000 \, x^{2} + 2432902008176640000 \, x

If we zoom in to plot this polynomial near the root at x=10, we can see that a standard evaluation of it starts to become inaccurate:

Wilkinson polynomial plot zoom 1
Graph of the Wilkinson polynomial over the interval (9.9,10.1).

Zooming in by another factor of ten, it becomes clear that numerical accuracy is a serious issue:

Wilkinson polynomial plot zoom 2
Graph of the Wilkinson polynomial over the interval (9.99,10.01).

This sort of problem is not unusual, and provides motivation for examining how we represent numbers on a computer, which is the subject of the next section.

Exercises

  1. Approximate the cube root of 2 by using the function f = x^3 - 2 with starting points x_0 = 0 and x_1 = \frac{3}{2}, with

    1. Three steps of the bisection method.

    2. Three steps of the method of false position (regula falsi).

  2. Compare the above results with using three steps of Newton's method, starting at x_0 = \frac{3}{2}. Which method is most accurate after three steps?

Review

  1. Compute 1/z in the form a + bi, with a and b real numbers, if z = 2 + 3i.

  2. Multiply the follow matrices to find C = AB.

    1. A = \left ( \begin{array}{rrr} 1 & 3 & 0 \\ 1 & 0 & 2 \end{array} \right ), \ \ B= \left ( \begin{array}{rr} 1 & -2 \\ 0 & 1 \\ -3 & 3 \\ \end{array} \right )
    2. A = \left ( \begin{array}{r} 5 \\ -1 \\ -2 \\ \end{array} \right ), \ \ B = \left ( \begin{array}{rrr} 1 & 0 & -1 \end{array} \right )
  3. For the system of equations u^2 + 4v^2 = 4, 4 v^2 + v^2 = 4, use two steps of the multivariate Newton's method to approximate the solution, starting at x_0 = (1,1).

  4. For the same system use two steps of the Broyden II method to approximate the solution, starting at x_0 = (1,1).