This is another post in connection with some slides shown in the lecture, which may have been a bit obscure.

The text here is concretely triggered by a slide in week 9 about “recursion with anonymous procedures”. The slide showed a version of the factorial function programmed in a way unlike any we have seen before (and unlike any we will see afterwards). And programmed in a rather obscure way. The factorial is programmed without recursion, in that there’s no procedure that calls itself, at least not in an obvious way. It only uses $λ$-expressions, i.e., only anonymous functions.

Recap: Coding factorial using only anonymous functions

Let’s start out with a recursive definition of fac. fac is bound to a $λ$-abstraction, and in the procedure body, fac is mentioned and called. Probably we got used to recursive definitions meanwhile that we don’t puzzle about that too much.

(define fac
    (lambda (n)
      (if (= n 1)
	  1
	  (* n (fac (- n 1))))))

Perhaps it’s worth to point out a crucial difference between define and let. It’s not possible to define fac using let as follows:

(let ((fac
       (lambda (n)
	 (if (= n 1)
	     1
	     (* n (fac (- n 1)))))))  ;; note that fac is
                                      ;; introduced via
                                      ;; let!
  <scope where fac is intended to be used>)

Let works similarly as define (though it has an explicitly specified scope): it binds fac to this lambda-expression. However, this time it won’t work as intended, as fac is not yet defined. If you try that example yourself in some scheme interpreter, make sure that fac has not already been defined earlier, otherwise it will look as if it worked insofar the correct value comes out. But in that case, the fac introduced via let simply calls the previously defined let, it’s not a recursive (re-)definition.

While at it: there exists a variant (not discussed in the lecture) of let which would work, it’s called letrec and that would allow an intended recursive definition of fac (and in that respect works analogous to define).

So far so good (and known from the lecture). But now we no longer want to use define to program factorial as above at least not recursively. Nor letrec obviously, nor while or other looping constructs that you favorite Scheme dialect may support (while is often supported. Additionally one can program while easily oneself (using recursion) so that would not help).

Now: let’s look at the $λ$-abstraction in isolation, i.e., the above standard definition just without giving it a name with define.

 (lambda (n)
   (if (= n 0)
       1
       (* n (fac (- n 1)))))

The base case is covered, but the branch that corresponds to the recursion case is not. For $n>0$, the body invokes fac which is undefined, resp. if it happens to be defined by coincidence from earlier, it’s probably not the factorial, as we are still struggling to get it defined. So let’s don’t rely on some unknown thing called fac coming from outside and probably undefined anyway, let’s hand over the missing continuation to cover the recursion case as functional argument:

  (lambda (f)    ;; let's refer to the whole
    (lambda (n)  ;; construction here (a higher-
      (if (= n 0);; order function) as F
	  1
	  (* n (f (- n 1))))))

NB: the term “continuation” has a specific meaning in functional programming and there exists a style of programming which is called CPS, continuation passing style. We are not claiming that the above code is strictly CPS, but there is a connection: We hand over a function that describes how to continue at the end of the function body, here at least that the one possible end that corresponds to the recursive case.

At any rate, let’s refer to the above function as $F$. Given a continuation function $f$ as argument, it corresponds to the body of the factorial.

The base case is covered, and in the recursion case, the body uses the argument $f$ to calculate the return value. Since $f$ is an argument, it can be anything, but what is needed for $n\geq 1$, where recursion should kick in, is to calculate $F$ again, this time with the numerical argument $n-1$. And going through the body of $F$ one more round would probably not be enough. So in the next-round’s recursion case, the same problem would present itself, namely how to continue just another layer of the body, and the solution would be the same yet again: do $F$ one more time, and if needed, still another round, and on and on.

That can be achieved by doing the following in the recursion case, calling $F$ and feeding to that next call to $F$ the function $F$ again, should that next round not be enough:

(* n ((F F) (- n 1)))  ;; recursion case in the body of F

Since $F$ is not only called but additionally handed over as further continuation in the next recursion case, the pattern repeats itself and the pattern can continue arbitrarily long. And for the factorial function, at least with a non-negative input, after a finite amount of repeating itself, the schema will hit the base case for $n=0$, and the correct value of $n!$ will be returned.

We are, however, not out of the woods yet. The previous code snippet mentions $F$, resp. $F F$ in the recursion case. Note that we have not officially named the higher-order function from the above Listing by the name $F$ (doing (define F ....): we agreed among ourselves to call it $F$ in the explanatory text, but not as part of the program.

We could have given the anonymous function officially the name $F$ with define, but what we discussed was that $F$ is used as argument to that function, i.e., in place for the formal parameter $f$. Besides, if we had introduced the name $F$ for the function and then used in the the recursion case, that would be a case of direct recursion using a function’s name, that’s exactly what we don’t want to do.

So: how can we use $F$ as argument to itself, without relying on direct recursion? That’s actually not hard, we just program it two times, and feed the second copy as argument to the first. However, as explained above, we need in the body something to the effect of (* n (F F) (- n 1)). That means, we need to massage the implementation $F$ in such a way, that $n-1$ is not fed as argument to $f$ (as done in $F$ and in the factorial function), but fed as argument to $f f$. That leads to the following massaged version of $F$

(lambda (f)    ;; new variant of F
   (lambda (n)
     (if (= n 0)
	 1
	 (* n ((f f) (- n 1)))))  ;; self-application of argument f

And if we apply that version to itself, we get the following function.

((lambda (f)     
   (lambda (n)
     (if (= n 0)
	 1
	 (* n ((f f) (- n 1))))))
 (lambda (f)    
   (lambda (n)
     (if (= n 0)
	 1
	(* n ((f f) (- n 1)))))))

And we are done, that’s the factorial!

One can test it easily on some input. Of course it looks a bit inelegant, so let’s clean it up a bit. We can introduce a name $F’$ for the massaged version of $F$ and using let to avoid repeating the code, and finally we can give the whole construction a conventional name, namely fac. Note that neither let nor the use of define for fac involves recursion.

(define fac (let ((F'  (lambda (f)    
			 (lambda (n)
			   (if (= n 0)
			       1
			       (* n ((f f) (- n 1))))))))
	      (F' F')))

Factorial is fine and good, but how to generalize that?

The above construction is concretely done for the factorial. Fine as it is, we are interested in doing it generally, i.e., given a recursive definition of a function and turning it to one that works without recursion. And it’s not good enough to understand the way it worked for fac, and when dealing with another recursive definition, do the same trick again for the body of that new function. A convincing generalization would be one that does not involve us, fiddling with the code, like retyping the body $F$ into the massaged version $F’$. Instead,

we want to define a Scheme procedure that takes the body $F$ and directly returns the recursive procedure that corresponds to $F$!

Also that is easy to do (kind of…), though we run into another (small) problem, at least in Scheme and similar settings.

It’s not just desirable to avoid to massage the code of $F$ in $F’$, it is necessary to do the whole trick without having access of the actual code of $F$, because $F$ is a formal parameter of the procedure. This we are forced to treat the functional argument as black box.

Turning $F$ to $F’$ without having access to the code of $F$ is actually quite easy. In the concrete factorial example, the code massage from $F$ to $F’$ “rewrites” the code so that a self-application $f\ f$ was used in $F’$ instead of $f$, as in $F$. We can achieve the same effect from the outside. Instead of feeding $F’$ into $F’$ and have $F’$’s body duplicate the argument $F’$ into a self-application $F’\ F’$, we just do the self-application outside and hand over $F\ F$ as argument.

  (lambda (f) (F (f f)))   ;; corresponds (somehow) to F'

Now we can apply that construction to itself ((lambda (f) (F (f f))) (lambda (f) (F (f f)))), doing the same trick as before in the special setting where $F$ represented the effect of the body of the factorial function. The only thing left to do is to have $F$ as argument to the construction, like the following

(lambda (F)                ;;  F as argument
  (lambda (f) (F (f f)))   ;; corresponds (somehow) to F'
  (lambda (f) (F (f f)))   ;; and is applied to itself

We may write it also in math-notation, i.e., as expression from the $λ$-calculus, and it looks like this

\(\lambda F. ((\lambda f. F\ (f\ f))\ (\lambda f. F\ (f\ f)))\)

[NB: the conventions for when and how to use parentheses in the $λ$-calculus are different from the conventions in Lisp or Scheme. One just has to be careful with that. For instance, if we had written above $F\ f \ f$ instead of $F\ (f\ f)$, it would look as if that corresponded to (F f f) in Scheme, but it does not; it would correspond to ((F f) f) in Scheme (and would not do the job). Just something one needs to keep in mind.]

Anyway, this expression is known in the $λ$-calculus as […drum rolls…]

the $Y$-combinator!

There are slight reformulations of that doing the same (for instance using let). And there are other such functions to achieve recursion, but doing it differently in a more serious manner, one of which we will (have to) look at.

First, let’s take the above $Y$ and try it out in Scheme, giving it its traditional name first

  (define Y (lambda (F)
    ((lambda (x) (F (x x)))
     (lambda (x) (F (x x)))))

resp. let’s use an equivalent reformulation with let, which is slightly shorter

  (define Y
    (lambda (F)
      (let ((f  (lambda (x) (F (x x)))))
	(f f))))

So, it took some meandering, we finally came up with a Scheme procedure that corresponds to the $Y$-combinator, which is known to achieve our goal: turn a procedure body like $F$ into a recursive procedure.

Then let’s reward ourselves and use it to run a version of factorial using the $Y$ combinator. Here’s again the body of the factorial from the beginning (see here):

  (define F (lambda (f)    
	      (lambda (n)
		(if (= n 0)
		    1
		    (* n (f (- n 1)))))))

and then proudly apply our $Y$ combinator to it:

  (Y F)

Ouch! That crashes the interpreter with a stack overflow. That’s bad news.

Wait a second, not so eagerly.

Crashing the interpreter is sure not desirable, but always look at the bright side: it’s good news too! The application is non-terminating, resp. in practice, it runs out of stack memory. That’s indeed a good sign, namely a sign of a recursion. Unfortunately a recursion gone wrong.

At first sight, it might be puzzling: we have encoded the famous $Y$ combinator but it does not work. As mentioned, however, $Y$ is not the only combinator to achieve the trick, there are variations of the general idea of self application.

The equation for $Y$ from above was written as term of the $λ$-calculus. Scheme can be seen as an implementation of the $λ$-calculus (with additional features needed for practical programming such as I/O etc). To be precise, there are also different $λ$-calculi, including many different typed versions, but Scheme most closely resembles an untyped $λ$-calculus.

But Scheme is a programming language, executed in a particular way, namely doing applicative order: arguments in an application need to be evaluated first before handed over in a procedure call. $λ$-calculi are often presented without fixing an evaluation strategy, resp. the evaluation strategy is left open and arbitrary. As presented in the lecture, for purely functional settings, the evaluation is based on substitution, the so-called substitution model from SICP. An expression can have multiple places where do so a substitution, i.e., multiple opportunities to apply a procedure to its argument(s), and an evaluation strategy fixes which one(s) should or could be taken. The lecture covered applicative and normal order evaluation, as the two practically relevant one for functional languages, but for the $λ$-calculus one can study more strategies (which involves where to evaluate and when to stop. Some strategies even allow multiple places in parallel or allow random choices). As a side remark, for $λ$-calculi one often speaks also of reduction strategies instead of evaluation strategies, and the basic substitution step is called a $β$-reduction step (but it’s another word for substituting the formal parameter of a function by its actual argument), and evaluation means “reducing” an expression to its value.

Scheme uses applicative order, it follows eager evaluation. And that’s the problem here. If we apply $Y$ to $F$, $F$ gets substituted into the body of $Y$, which is another (self-)application, that needs to be evaluated. After substitution, there is another (self-)application, so the process never ends, there is each time still another application as argument, and eager evaluation requires that the argument needs to be evaluated, so it never stops:

\[\begin{array}[t]{l@{\qqad}l} Y\ F & \rightarrow \\ \mathit{let}\ f = \lambda x. F (x\ x) \mathit{in}\ f\ f & \rightarrow \\ (\lambda x. F (x\ x))\ (\lambda x. F (x\ x))& \rightarrow \\ F\ ((\lambda x. F (x\ x))\ (\lambda x. F (x\ x))) & \rightarrow \\ F\ (F\ ((\lambda x. F (x\ x))\ (\lambda x. F (x\ x)))) & \rightarrow \ldots \end{array}\]

My bad, it’s recursion, but useless…

But it can be repaired. What’s needed is to delay the further evaluation of self-application argument, something like

\[\begin{array}[t]{rl} (\lambda x. F\ (\mathbf{delay}\ (x\ x)))\ (\lambda x. F\ (\mathbf{delay}\ (x\ x))) & \rightarrow \\ F\ (\mathbf{delay}\ ( \begin{array}[t]{l} (\lambda x. F\ (\mathbf{delay}\ (x\ x))) \\ (\lambda x. F\ (\mathbf{delay}\ (x\ x))))) \end{array} \end{array}\]

At that point, the argument of the outermost $F$ is not further explored, but handed over as value to $F$. After that substitution step, its an expression that looks like:

\[\begin{array}[t]{lll} \lambda n. & \mathit{if}\ & n= 0 \\ & \mathit{then}\ & 1 \\ & \mathit{else} & n \times \langle\text{self-application again (with delay)}\rangle (n-1) \end{array}\]

That’s a function that takes a number as argument, and does the body of the factorial and uses itself again as continuation in the recursion case. In particular, the body after $\lambda n$ is not further evaluated. It only starts getting into action when we provide a numerical argument. But this time, when giving a numerical argument, the recursion will stop, as at some point it will hit the base case, (at least for arguments $\geq 0$), just as the factorial does.

Now how do we do that form of delaying? Not evaluating arguments in a procedure call also underlies normal-order evaluation and the closely related notion of lazy evaluation. It also called delayed evaluation (or call-by-need), just what we are looking for. The lecture discusses two special forms delay and force in that context, but we also discuss how one can delay evaluation without relying on those built-in special forms.

It goes like this: First observe that a $λ$-expression like $\lambda x. e$ is a value, it counts as evaluated. In the $λ$-calculus, one might find places in the body $e$ where one could reduce, if one allowed substitutions to be done at any place inside an expression, not only on the top-level, but that’s not how it works in Scheme (or programming languages in general). Procedures only get evaluated when and if actually called. Now suppose that $e$ represents itself a function. It could itself be an application but after some evaluation steps it will evolve into a function. But by adding a $\lambda$ in front and applying $e$ to the formal argument $x$, we can delay the evaluation of $e$:

\[\lambda x. e\ x\]

That’s the trick that delays the evaluation of $e$ until an actual argument is provided. NB: in the $λ$-calculus, $e$ and $\lambda x. e\ x$ are said to be $η$-equivalent (“eta-equivalent”). Of course, it’s required that $e$ does not by coincidence mentions $x$ as free variable. But we can also pick another variable instead of $x$ if need be.

The trick make sense only if $e$ corresponds to a function, so (lambda (x) (1 x)) is not really meaningful. In the 100% pure and theoretical $λ$-calculus, everything is a function anyway and one needs not to worry. In Scheme $1$ is not a function, so we would have to be careful, but thankfully, the self-application $x x$ represents a function. So we can use the $η$-delay trick and write it up like that:

\(Y' = \lambda F. ((\lambda x. F\ (\lambda y. x\ x\ y))\ (\lambda x. F\ (\lambda y. x\ x\ y)))\)

That’s also known as the strict variation of the $Y$ combinator, as it does the job for eager functional languages like Scheme (and strict means following eager / applicative order evaluation).

And now, we are really done! For good measure, let’s just give the corresponding Scheme code.

(define Y'
  (lambda (F)
    (let ((f  (lambda (x) (F (lambda (y) ((x x) y))))))
      (f f))))

Wrapping up some loose ends

The $Y$-combinator is also called Curry’s paradoxical combinator (after Haskell Curry), and $Y$ and its variants are known as fixpoint combinators. Ultimately, those are just complicated functions or procedures, exploiting self-application in one way or the other. But why combinators? There’s no deep meaning behind it. Ultimately (and for historical reasons) a $λ$-term without free variables is called a combinator. With this terminology, (lambda (x) (* x x)) is a combinator that calculates squares as there are no free variables. Of course if we count * as free variable, which we should if we take it 100% exact, then it’s not a combinator, but let’s ignore that here and no one speaks like that anyway and says “square-combinator”…. Anyway, there are versions of the $λ$-calculus that do away with variables altogether. One cannot even write down “procedures” with formal parameters, as there are no variables at all and one is forced to work with combinators only. The calculus looks quite alien, and it’s connected to combinatory logic. Indeed, the $λ$-calculus (both typed and untyped) have roots and deep connections (also) in and to logics.

Why is it called paradoxical combinator? That has to do with said connections to logic. Curry and others invented and investigated such combinators in connection with (foundations of) logics, and $Y$ and its friends have connections to logical paradoxes.

Why fixpoint operators? As it turns out, applying $Y$ to a function (like $F$) calculates what is called a fixpoint of its argument, like a fixpoint of $F$. A fixpoint of a function as such is easy to understand: a fixpoint of $f$ is a value $a$ such that $f(a) = a$. For our specific $F$, the fixpoint of the construction results in the factorial:

\[Y\ F = f_\mathit{factorial}\]

but it’s a general observation: A recursive function can be understood as fixpoint of a function representing the effect of its body, and a $Y$-combinator calculates the proper fixpoint.

Proper fixpoint means, the smallest fixpoint though working out in which way to understand “small” and understand why it always exists and why it is uniquely defined would require more explanations and background. fixpoints are quite interesting, for instance, there is a connection between “eager” and finite data structures which are smallest fixpoints of some construction and “lazy” and potentially infinite data structures, like streams, which are largest fixpoints. But we leave it at that (perhaps for a later post) as the text is getting longish already…

2025

Back to top ↑

2024

Y Y?

17 minute read

or why Y?

Worklist algorithms

12 minute read

A small amount of background information (best read in the context of the first sets)

Back to top ↑

2023

Y Y?

17 minute read

or why Y?

Back to top ↑

2022

Back to top ↑

2021

Fol Soundandcomplete

10 minute read

w— title: “Sound & (in-)complete proof system for first-order logics” excerpt: “Some clarification (I hope)” last_modified_at: <2021-07-15 Thu> hea...

Back to top ↑