A favorite proof of mine

There are a lot of proofs in mathematics. Many of them serve to verify what we intuitively know to be true, some of them shed light on new methods, and some reveal new ways to view old ideas. Some proofs leave us with a headache, some leave us bored, and some leave us with wonder. In this blog post I will share a beautiful proof leading to a closed formula for the nn-th Fibonacci number, taking us on a detour into functional programming and linear algebra.

The Fibonacci Numbers

The Fibonacci numbers are a sequence of numbers starting with a zero followed by a one where each consequent number is the sum of the last two: 0,1,1,2,3,5,8,130, 1, 1, 2, 3, 5, 8, 13 \dots. If we wanted to be more precise we could define a mathematical sequence {f}n=0\{f\}_{n=0}^{\infty} by the following recursive definition:

fn={0  if  n=01  if  n=1fn2+fn1  otherwisef_n = \begin{cases} 0\;\textrm{if}\;n=0 \\ 1\;\textrm{if}\;n=1 \\ f_{n-2} + f_{n-1}\;\textrm{otherwise} \end{cases}

The Fibonacci numbers have become a bit of a poster child for recursive definitions, perhaps due to it’s simplicity. You’ll surely find it in the early chapters of most books teaching functional programming (a programming paradigm where recursive definitions are common).

Indeed, if we open Chapter 5: Recursion of LYAH we are greeted with the following.

Definitions in mathematics are often given recursively. For instance, the Fibonacci sequence is defined recursively.

Likewise, in Chapter 1.2.2 Tree Recursion of SICP we are yet again greeted by a familiar face.

Another common pattern of computation is called tree recursion. As an example, consider computing the sequence of Fibonacci numbers

With this in mind, it might come as a surprise that there is a closed, non-recursive, formula for the nn-th Fibonacci number. Perhaps more surprising is that we will discover this formula by using the ideas presented in the above chapter of SICP.

Programmatically calculating the nn-th Fibonacci number

A naive way of calculating the nn-th Fibonacci number is to use the definition above. Check if n=0n = 0, if n=1n = 1, and otherwise calculating fn2f_{n-2} and fn1f_{n-1}. This corresponds to the following Haskell code:

fib :: Integer -> Integer
fib 0 = 0
fib 1 = 1
fib n = fib (n-2) + fib (n-1)

However, there is an issue with this method, many Fibonacci numbers will be calculated numerous times, as for each Fibonacci number evaluated we split into two paths, evaluating the previous and twice previous Fibonacci number. The reader which prefers visuals might appreciate Figure 1.5 from the SICP chapter.

How might we fix this then? A human calculating the nn-th Fibonacci number might construct a list of Fibonacci numbers, calculating each Fibonacci number only once. While it is possible to do this on the computer it is superfluous to carry around all previous numbers, as we only need the previous two to calculate the next one. We might think of this as a 2-slot window, moving along the Fibonacci numbers, taking nn steps to arrive at fnf_n. In code we could represent this as follows:

--        steps   -> f n-2   -> f n-1   -> f n
window :: Integer -> Integer -> Integer -> Integer
window 0     a b = a
window steps a b = window (steps-1) b (a+b)

fib :: Integer -> Integer
fib n = window n 0 1

In each step we move the window by replacing the first slot in the window by what was previously in the second slot and filling the new second slot of the window with the sum of the previous two slots. This is then repeated nn times, and then the first slot of the window is returned.

Mathematically calculating the nn-th Fibonacci number

What does this have to do with mathematics, and this beautiful proof that I have promised? We shall begin to translate this moving window into the language of mathematics, our window is a pair of numbers, so why not represent it as a vector. Furthermore, we may view sliding our window one step as a function SS from vectors to vectors. This poses an interesting question: is this function a linear transformation?

S([ab])+S([cd])=[ba+b]+[dc+d]S \left(\begin{bmatrix} a & b \end{bmatrix}\right) \\ + S\left(\begin{bmatrix} c & d \end{bmatrix}\right) \\ = \begin{bmatrix} b a + b \end{bmatrix} \\ + \begin{bmatrix} d c + d \end{bmatrix} =[b+da+b+c+d]== \begin{bmatrix} b + d \\ a + b + c + d \end{bmatrix} = S([a+cb+d])=S([ab]+[cd])S\left(\begin{bmatrix} a + c \\ b + d \end{bmatrix}\right) = S\left(\begin{bmatrix} a \\ b \end{bmatrix} + \begin{bmatrix} c \\ d \end{bmatrix}\right) S(α[ab])=[αbα(a+b)]=α[ba+b]=αS([ab])S\left(\alpha \begin{bmatrix} a \\ b \end{bmatrix}\right) = \begin{bmatrix} \alpha b \\ \alpha (a + b) \end{bmatrix} = \alpha \begin{bmatrix} b \\ a + b \end{bmatrix} = \alpha S\left(\begin{bmatrix} a \\ b \end{bmatrix}\right)

It is! This is great news as it means we can represent our step function by a matrix. With some basic linear algebra one can deduce that

S=[0111]S = \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}

Then to calculate the nn-th Fibonacci number we take the starting window [01]\begin{bmatrix} 0 \\ 1 \end{bmatrix} and multiply it by SnS^n. Now that the sliding window has been expressed purely in the language of linear algebra we may apply the tools of linear algebra.

Applying the tools of linear algebra

If you’re familiar with linear algebra there might be a part of your brain yelling “diagonalization” right now. We’ve translated our problem into linear algebra, but even for a small matrix calculating SnS^n can become costly for high nn. Diagonalization is a technique in which we express a matrix in a base where all base vectors are eigenvectors of the original matrix. The benefit of doing this is that it turns exponentiation of matrices, which is hard to calculate into exponentiation of scalars, which is much easier to calculate.

An eigenvector for our matrix SS is a vector v^\hat v for which Sv^=λv^S \hat v = \lambda \hat v for some scalar λ\lambda, which we call an eigenvalue. If there are any such vectors we can find them using their definition.

Sv^=λv^=λI2v^S \hat v = \lambda \hat v = \lambda I_2 \hat v

Subtracting λI2v\lambda I_2 v from both sides yields:

0=Sv^λI2v^=(SλI2)v^0 = S \hat v - \lambda I_2 \hat v = (S-\lambda I_2) \hat v

An equation of the form 0=Au^0 = A \hat u will only have non-trivial solutions if the column vectors of AA are linearly dependent, that is if det(A)=0\textrm{det}(A) = 0. Thus we can find all scalars λ\lambda for which there are non-trivial vector solutions by solving det(SλI2)=0\textrm{det}(S-\lambda I_2) = 0. Because of this property the polynomial det(AλI)\textrm{det}(A-\lambda I) is called the characteristic polynomial of AA.

In our case we have the following:

det(SλI2)=λ111λ=λ2λ1=0\textrm{det}(S-\lambda I_2) = \begin{vmatrix} - \lambda & 1 \\ 1 & 1 - \lambda \end{vmatrix} = \lambda^2 - \lambda - 1 = 0

Solving for λ\lambda yields two eigenvalues:

λ0=152,  λ1=1+52\lambda_0 = \frac{1 - \sqrt 5}{2} ,\; \lambda_1 = \frac{1 + \sqrt 5}{2}

Would you look at that, 1+52\frac{1 + \sqrt 5}{2}, the golden ratio! Some of you might already know that the golden ratio is connected to the Fibonacci numbers, in fact, as you get further and further into the sequence of the Fibonacci numbers the ratio fn+1fn\frac{f_{n+1}}{f_n} approaches 1+52\frac{1 + \sqrt 5}{2}.

Now we can solve (SλI2)v^=0(S-\lambda I_2) \hat v = 0 for λ0\lambda_0 and λ1\lambda_1, and if the two resulting vectors are linearly independent we may use them as the basis of our diagonalization matrix. Gauss elimination yields:

v^0=[251],  v^1=[25+1]\hat v_0 = \begin{bmatrix} -2 \\ \sqrt 5 - 1 \end{bmatrix},\; \hat v_1 = \begin{bmatrix} 2 \\ \sqrt 5 + 1 \end{bmatrix}

These vectors are indeed linearly independent, and we can use them as basis vectors for our diagonal matrix. We will now to write S=BDB1S = BDB^{-1} where B=[22515+1]B = \begin{bmatrix} -2 & 2 \\ \sqrt 5 - 1 & \sqrt 5 + 1 \end{bmatrix} D=[152001+52]D = \begin{bmatrix} \frac{1 - \sqrt 5}{2} & 0 \\ 0 & \frac{1 + \sqrt 5}{2} \end{bmatrix}

We then have that

Sn=(BDB1)n=BDB1BDB1BDB1nS^n = (BDB^{-1})^n = \underbrace{BDB^{-1} BDB^{-1} \dots BDB^{-1}}_n =BDIDIDB1n=BDnB1= \underbrace{BDIDI \dots DB^{-1}}_n = BD^nB^{-1}

Which is very nice since

Dn=[152001+52]n=[(152)n00(1+52)n]D^n = \begin{bmatrix} \frac{1 - \sqrt 5}{2} & 0 \\ 0 & \frac{1 + \sqrt 5}{2} \end{bmatrix}^n = \begin{bmatrix} \left(\frac{1 - \sqrt 5}{2}\right)^n & 0 \\ 0 & \left(\frac{1 + \sqrt 5}{2}\right)^n \end{bmatrix}

After calculating B1B^{-1} we can solve [fn+1fn]=BDnB1\begin{bmatrix} f_{n+1} \\ f_n \end{bmatrix} = BD^nB^{-1} for fnf_n to get our closed expression.

fn=15((1+52)n(152)n)f_n = \frac{1}{\sqrt 5} \left(\left(\frac{1 + \sqrt 5}{2}\right)^n - \left(\frac{1 - \sqrt 5}{2}\right)^n\right)

Final thoughts

Whenever I first happened upon the closed formula for the nn-th Fibonacci number it seemed so shockingly random, a formula with bunch of square roots always giving me a recursively specified integer. After I learned this proof it doesn’t feel as random anymore, instead, I feel it would be more surprising if we carried out the whole diagonalization process and ended up with no roots. Perhaps more importantly, it opened my eyes to the usage of linear algebra as a powerful mathematical tool, and not just something to be applied in geometry, flow balancing or computer graphics.

Addendum

It was pointed out to me on mastodon that this technique is of interest even if it is not possible to diagonalize the stepping matrix. This is because using fast binary exponentiation one can perform matrix exponentiation in logarithmic time. Thus any linearly recursive algorithm can be calculated in logarithmic time!

Fast binary exponentiation uses the identity A2k=(AA)kA^{2k} = (A \cdot A)^k, thus we can split the exponent in 2 when even, rather than performing 2k2k multiplications. Recursively doing this each time the exponent is even yields logarithmic time exponentiation.