Martin Steffen’s homepage

Welcome to Functional programming (IN2040), autumn 2024

2024-05-22T00:00:00+02:00

About this semester

Lectures

Corona is a thing of the dim pasts and lectures will be held the way it used to be, in the lecture hall (Simula@OJD).

Youtube

One perhaps positive effect of the virus times was that many lectures produced video-ed versions of the presentations, like screencasts or recorded lectures. This lecture as well. We will not make a new version of the videos, there is no reason for doing that (and lot of effort went into it), but we will link in the versions produced mostly 2020, uploaded at youtube.

Language

The lecture will be given in Norwegian. 2 years ago, autumn 2022, it was actually was my first lecture ever given in Norwegian, so this is the third time. We’ll see how it goes, maybe my Norwegian improved. In these (inofficial) pages, however, I’ll write in English, it takes too much time otherwise.

Exercises and group work

The videos are fine as supplementary information and electronic format. Though reading the book and in particular doing the exercises and obligs is central for mastering the material. I like to compare that with learning how to play the guitar. No particular reason for exactly choosing guitars as comparison, except that since some time I try to learn doing that. Of course one can watch and listen to Eric Clapton, Andrés Segovia, or Eddie van Halen on youtube (or whatever your top guitar hero might be), maybe watching the close-ups of the finger plays, or listening to some guitar teacher. All that may be instructive, it may help to correct the way you hold your guitar or place and move your fingers, or you may see new techniques, and it sure can be inspiring and motivating, to see and listen to some master or instructor.

Seing enough youtube and reading enough material, one may be able to convincingly talk about what’s important when playing a guitar, describe techniques, using terminology like a pro (finger picking, leganto, arpeggio, whatever). But a talk about how to play guitar is not a live gig. So unless one strums the six strings with one’s own fingers, starting with simple chords, and gradually advancing the level of difficulty, one will never get far.

Same here: talking about programs like a pro (tail-recursion, applicative order, higher-order functions, stream-programming, to mention a few topics covered by the lecture…) is not the same as programming. So, doing exercises is important!

Learning is not the product of teaching. Learning is the product of the activity of learners. (John Holt)

Worklist algorithms

2024-02-23T00:00:00+01:00

Motivation

Compilers make use of many different algorithms for numerous tasks in the different phases of the compiler. They come from different algorithmic areas, like graph problems, or searching, optimization etc.

In the lecture, we have encountered some problems (and actually there might be more later), where we said:

A worklist algorithm would be a good idea to solve it!

Sometimes I sketch a worklist algorithm, and sometimes I just mention such an algorithm, and use instead a solution that does not bother with a worklist formulation.

But I never nail down what worklist algorithms or worklists actually are. Since, as said, worklist algorithms show up at different places in the lecture and show up in more places inside or outside a compiler, this post tries to shed some light on the concept.

For concreteness sake, I won’t make a general discussion of principles of worklist algorithms as such, nor try a panoramic overview over different applications. Instead, we discuss it mainly in the context of the calculation of the first-sets in the chapter about parsing. So the text is best read together with the material about the first-sets and I won’t repeat here the definition of that concept and what role it plays in parsing.

First-set of a non-terminal (simple version and no worklist)

We take, for the discussion, the simplified version of the algo, namely the one that does not have to deal with so-called epsilon-transitions. Those add an extra layer of complication as far as the first-sets are concerned, though the principle of the worklist algorithm is the same. So we take the simpler one for illustration.

Here is the pseudo-code of the non-worklist algo for the simplified version of the first-set calculation:

This formulation of the algo makes no effort to focus where work needs to be done, thereby wasting time. The “work” that needs to be done are the steps in the body of the loops, here simply updating the information about the first-sets in the corresponding array, increasing that information until saturation.

Run of the algo on a simple example

The algorithm is illustrated in the lecture by a standard example, a grammar for arithmetic expressions:

The following table illustrate the run of the algorithm, going through different passes (the picture from the lecture is taken from K. Louden’s book ``Compiler Construction: Principles and Practice’’):

The three passes shown in the table correspond to three iterations of the outer while-loop. Actually, the algo goes through the outer loop 4 times, so there is a 4th pass. In that last round, the algo detects that nothing more changes compared to pass 3, so after the 4th round, the algo terminates.

In the table, many slots are empty. That are the cases where nothing changes, i.e., where the corresponding production is treated, but updating the array does not actually increase the information in the corresponding slot, it leaves it unchanged. So that’s wasted work.

How do improve that?

The worklist algo (or worklists algos in general) improves that, avoid unnecessary work, and additional and in connection with that, to organize the work rationally.

In this example, the work to avoid is the single line in the body of the inner loop. If one could do the update only in those case where actually something changes, corresponding to the non-empty slots of the table, that could be an improvement.

One could reformulate the algo as follows:

So, this version calculates an overview over those places which really require an update. For each pass, it stores those in a worklist, works it off to do the required work. This way, it skips over the useless updates as they are not contained in that list. Never mind, that this time a repeat-until is used instead of while-loop, that’s irrelevant for the discussion here.

What’s wrong with that?

Skipping work sounds tempting, but to call the code an improvement makes basically no sense… The worklist, constructed at the end of the loop-body, calculates exactly the places where work needs to be done, if any, in the next pass. It does so by checking if First(A) $\union$ First(X_1) is different from First(A). But if there is a difference, the major part of the work has been done already, namely doing the union operation (and additionally a comparison on top). So checking first all productions or slots, that require actual work, i.e., where a real update would happen if one would do it, and then focusing on those where that is the case does not really bring anything. So one replaced the routine repeated update of all productions with a (repeated) check of all productions to find out precisely which one to use for an update and which not, and that does not save anything. Unless it’s the act of updating itself which dominates the costs, but that is implausible.

Doing away with passes and with the worklist

The original version as well as the previous one works in passes or rounds. If, in each pass, the task are tackled in the same order, one could call it a round-robin strategy. Both versions do that, the first does the update indiscriminately for all productions, the second one focuses on each round on exactly the productions where the first-set information needs an update, and employs a worklist to manage that.

The following version does it slightly differently: it checks if a production leads to an update, and if so, does the required update immediately. In that case, no data structure is needed that remembers for a while that some work needs to be done later. So, the WL data structure is unnecessary.

The algo picks an arbitrary production where work needs to be done is picked and treated. Consequently there is no guarantee that, once a production P is treated, first all others are checked to be treated or skipped, before it’s P’s turn again. In other words, this version is no longer round- or pass-based. Due to the randomness of which production to treat next, such a formulation is also called chaotic iteration.

So that algo checks where work needs to be one, and never does unnecessary update. Since the necessary work is done immediately, there’s no need to store it in a WL data structure, so one would not call it a worklist algorithm.

Actually, the chaotic iteration has the same problem than the round-based version before, which used a WL: doing the checks to avoid work is not worth it.

There is another silliness in the code, namely the fact that the union of the two first-sets is calculate calculated two times, one time for the check, the second time for the update itself. Actually, it’s worse. The algo has to find a production that needs treatment, and that may mean, it may search for one, repeatedly checking productions that turn out to need no treatment before finding one that does. This means, the attempt to focus on exactly the productions that actually need treatment to avoid all the places where it’s unnecessary multiplies the effort (at least in the chaotic iteration version).

For fairness sake: the random algo called chaotic iteration is not meant as template for a realistic solution. It is more an extreme case of approaching the problem. Note that being completely chaotic means, that, by chance, it could in some run behave like, for example, the round-based one. Since one can prove (under some assumptions we won’t discuss here) that chaotic iteration terminates with the correct result, one has, at the same time, established that all other specific strategies, like the round-based one, also work correctly.

Before finally addressing the real worklist solution, let’s summarize the insights so far. We have seen first a solution that treats productions unconditionally, without checking whether it’s needed or not. On the other side of the spectrum is a solution that never does unnecessary work, but checking whether work is needed or not did not improve things.

Needed is a way to

skip pieces of useless work (productions, slots in the table,… ) without searching for work and without checking whether it’s useless or not!

Actually, the round-based version using WL data-structure would also not be called worklist algorithm, because it does not make use of this core aspect of worklists.

Approximation of the work load and managing dependencies

To achieve that, the core trick is to approximate the work to be done. So the work list does not contain exactly the tasks that need to be done, but those that potentially require work. The only thing one is sure is: if some task is not in the worklist, it can be safely skipped.

So the worklist over-approximates a precise version of a worklist, which would be too costly to maintain. Besides that, it also avoid searching for work to be done, which is a bad thing to do, as we discussed in connection with the chaotic iteration.

Before we say, how to achieve that, we should clarify the following: when we said, a task is not in the worklist, in our case of the first-set, a production, and thereby can be safely skipped, then that is just a snapshot of the current situation. The worklist is ``worked-off’’ by the algorithm, i.e., the algorithm picks a piece of work from the worklist, which may be real work, leading to an update, or not. Then the work, i.e., the update is executed (and the piece of work removed from the work-list, as being done). However, the worklist does not only get shorter, it will typically also get larger. That means, some production where it’s clear it needs no treatment now (being not currently listed in the worklist), may be entered into the worklist later as suspicious of potentially in need of treatment. Indeed a production (or piece if work) may removed and be re-entered to the worklist many times. That is characteristic of worklist algos. An algorithm that uses a list and then trickles down the list from head to tail, treating one element after the other, is not called worklist algorithm, even if it does work and it uses a list. But it’s too trivial to deserve the name…

We have mentioned that treating a piece of works removes it from the worklist, as being done for now. We have still to explain how a piece of work is (re-)entered into the worklist. That has to do with the fact that some pieces of work depend on others. In our setting If we update the information on one non-terminal may make it necessary to (re)-consider other non-terminals, resp. productions for them, and thus (re-)add them to the worklist; again without checking if the re-added production means real work, or not. But because of that dependency, they are suspicious of potentially requiring work and thus added to the worklist.

If we look again at the table with the passes 1,2,3 for the expression grammar example, we see also that some productions, corresponding to lines in the table, in some pass require work, sometimes not, though the example is so simple that no production is targeted twice.

The dependencies form a graph. For instance in the expression example, the expressions depend on terms, terms depend on factors, and factors in turn depend on expressions. More precisely, the first-set of expressions depends on the first-set of the term non-terminal, the first-set of the term non-terminal depends on the corresponding information for factors, and finally there is the dependency of the first-set information of the expressions on that for factors.

There is no direct dependence of the information for factors on that for expressions (only an indirect dependence, and a dependence in the opposite direction). In the table, the information about factors, which is added in the first pass, reaches the terms in the second pass, but not the expressions yet, as there is no direct dependence and it takes a 3rd pass to propagate to expressions, and a 4th pass to find out, that all is stable by now.

A worklist algorithm would treat (productions for) terms as being directly dependent on factors, but expressions not. Thus updating information for factor would add term-productions to the work list, but not expressions. They may or may not be added depending on whether the term productions lead to an update to term or not (on this particular example, there will be at least one situation, where terms are updated, and thus expressions will be treated, as well, of course).

Finally the code

Without further ado, here some pseudo-code based on work-lists for the first-sets (without epsilon-transitions).

We said, one core trick is to avoid checking. As the code shows, of course there is still checking whether work needs to be done, but the real purpose of that check is not to avoid the corresponding update when not needed (though that is a consequence of the check), the real purpose is not to (re-)add some pieces of work to the work-list that potentially need update, because they depend on the current production. That is the purpose of the check. Note also, that this also avoids searching for work, at least to some extent. Since the work list contains places where work may or may not be required, working off the worklist may remove productions that are ok currently, before one finds one that needs treatment.

Of course, especially in the light of the discussions above, the code is slightly silly, calculating the union of the two sets with the current first set information two times, first for the check, and then again for the update. But that’s easy to avoid, storing the result in a variable.

Note in passing: in the semantic phase or static analysis phase, there are techniques that allow a compiler to detect situations like that, where the user is silly enough to evaluate a complex expression more than once, like in the previous code. The compiler would then use that information and transform the program, in the way sketched: calculate the result once, store it, and later reuse it (if that is that improvement).

In this particular piece of code, the reuse-the-result situation is pretty obvious: the expression is reused immediately after the first use. Sometimes it’s more complex, and one has to take also check if it’s a real re-use situation. Just because the same expression occur twice, one has to make sure that one really comes before the other (in the presence of loops and branches) and also that the result is still the same. The expression may contain variables, and those may change, at least in an imperative programming language. So, it’s a non-trivial analysis problem; it’s a data flow problem, which is an important class of program analysis. In this lecture, we will not dig deep into many different data-flow problems (there are very many), in particular we won’t elaborate on the one sketched here; that one is actually called available expression analysis. But we will discuss liveness analysis, which is another important data-flow problem.

As it happens, liveness analysis, available expression, and many data-flow analyses can well be solved by a worklist algorithm…. As mentioned, there are many problems, especially inside a compiler, where worklist algorithms are useful.

How to write a parser for the first time, e.g. for the oblig

2024-02-13T00:00:00+01:00

The task of the mandatory assignment is to write a compiler, perhaps not a full-blown compiler for a complex language, not even half-blown, but still covering some phases. One phase is the parser, resp. the lexer and the parser, both working hand in hand.

The task is not supposed to be overwhelming. It’s a focused and rather well-specified task using mature tools and techniques. It has been done many times before. Though done by others, for instance earlier participants of this or similar courses.

For someone who does it the first time, for instance for the mandatory assignment, it involves to get an overview over a couple of moving parts that all need to fit together and which are all potential sources of errors. Hopefully, the underlying concepts like regular expressions, BNF, bottom-up parsing, etc. are halfway grasped from the lecture. That helps, but only up-to a point. Now one has to fit it all together into a running program. One has to deal with CUP, with JLEX, or similar tools, one has to deal with ant, if one uses that, etc. all coming with pages and pages of documentation. So, where to start?

Concretely, I describe first steps in the Readme at the repos. They should be followed for quick check, if at least the provided initial checks work out of the box, i.e., Java works, jlex and cup work, ant works, in the provided configuration. And that should be done right away.

But that’s of course really only one initial step. Then the real work starts. In earlier years, sometimes groups or individuals got stuck or overwhelmed. They had invested effort to rig up a parser, defined the grammar, designed some AST, etc., coded it up, but somehow “it did not work”. And then it took quite some time and help to debug it… It happened not often, maybe 3 times I saw that “approach”, but it may be that more people had similar problems without asking for help, perhaps silently throwing the towel and not completing the obligs, or fighting it through with more effort and headache than the oblig is supposed to cause.

I gave, in those cases some advice, which I repeat here, mostly in the “let it grow” (and keep on testing) paragraphs below.

Mainly it’s how I would do it resp. how I do it, when writing a parser, or other things, especially if it’s the first time with some new technology or language I have not much experience with. I have written compilers and parsers a couple of times (also for the compila language and others), mostly in the context of teaching or research, though not for an industrial grade compiler project from scratch. The two mentioned paragraphs are in particular for the parser and the lexer (i.e., oblig 1), though similar strategies I use also in other contexts. Especially the remaining points are more general than for a parser.

The strategy I describe is meant to avoid the scenario from above: Namely to design it and code everything up, only to detect in the end (which is typically shortly before the oblig deadline…) that “it does not work”. There seems many “bugs” of this and that kind, the exact causes are a bit unclear, it’s unclear when they were introduced, some syntactically correct compila programs throw an error, some erroneous programs parse, so there seem also troubles with the grammar (which still has some shift-reduced conflicts that need to be analyzed and removed). And the build-file is not yet properly adapted, so one has to fiddle with it and try to get it running semi-manually, and errors are hard to reproduce. It’s a big mess.

It means the project has grown out of hand and the code has become overwhelming, one has lost overview and no longer knows where to start repairing; there are too many flames to put out and time is running short. It can become rather frustrating. And maybe it was all done with good intentions: “Let’s make first a good battle plan, design the grammar precisely according to what one has learned in the lecture, then carefully design some AST data structure, according to the recipe from the lecture, or according to one’s own ideas, fix all the reserved words as they are specified in the language specification, etc.”. When everything is thought out carefully and seemingly complete, one integrates the parts and and checks if it works, perhaps in our case, trying to parse the provided compila test program.

Only to find out that there seems to be a number of problems, maybe starting from the fact, that the integration fails, it just does not “fit together”, and that’s only the beginning of the bug hunt.

The sketched strategy is not bad per se, it can be seen as textbook top-down development. However, I would not approach the problem like that, in particular, if it’s the first time and if unsure about how lex and yacc actually work, and unsure about other technological aspects.

I am not saying top-down development is bad, it has its place, and most development processes are probably a mixture of top-down and bottom-up anyway. Actually, the overall parser-task (or both obligs) is developed top-down: there is a pretty detailed specification up-front, the language specification, there is some test suite provided up-front etc., before anyone starts to write any line of code. It’s only that the specification is not written by the course participants and coders. And that the specification and implementation phases are done by different people is not unheard of either in software development.

Bottom-up, integrate early, let it grow and stay healthy

The way I would approach it is to integrate early, actually right from the start. I.e., doing a compilable and runnable main program of some sort. The integration should involve a yacc or CUP or whatever specification, a lex specification, and a main-program that joins them to a program, that can read in a file and parse it. Also part of that integration must be the build-file. If you are using ant, you can use the provided build.xml as starting point; or do Makefile or maven or whatever. The task at hand, programming a parser, is rather easy, so any of the build facilities provide more functionality that you need for that, and you should be able to configure one of those without (much) reading their documentation and becoming a build-tool virtuoso. Keep it simple and straight.

Don’t use Eclipse or any fancy IDE for integration. I am not saying you should not use Eclipse or any of those when programming. But don’t rely on their behind-the-scenes wizardry. Perhaps they manage stuff because you configured your tool and set-up your project this and that way, and you don’t even know what is done. Besides that, the end-product of the project, the compiler, has to work independently from any development environment. One cannot expect that someone downloads a piece of program and it’s run within one specific IDE or editor. Users use programs, they do not edit or develop them. Ok, we will expect that they build them, but not more. We are not targeting consumer-level users, but users interested in checking out a compiler …

Since the end product needs to come with a build-mechanisms, in the spirit of early integration, also the build process as one of the several moving parts of the project should be in place right from the start!

That’s important, because when developing the parser, it’s a good idea to check and re-check and re-re-re-check the growing parser. And that testing must go painlessly, fast, and easy. If testing is hard and cumbersome, it’s not done. Building a compiler requires at least a few steps, like invoking yacc, invoking lex, and then compiling the whole code (and perhaps running the compiler).

There is not overly many steps, so the build process is fairly easy. Still you want to automatize it.

What to leave out first and what to ``grow’’?

Now, it seems everything is already in place from almost the start, what is left out? I would start with an almost trivial grammar and perhaps even an almost trivial lexer specification as well.

That means, one does not start with trying to nail down the compila grammar as described in the language spec. Instead, one makes a parser for a different language, maybe for a super trivial start one that corresponds to a grammar like

program ::=

With the grammar like that, there is only one syntactically correct program, which looks this

  begin
  end

One should test out whether one can have an integrated parser being built without problems.

And one should test if the parser parses correctly according to the current state of the grammar. So one has to (temporarily) provide one’s own test program(s). Also testing should be integrated into the build-process.

Instead of starting with trivial program as first shot and an integrated ``parser’’, one can also work bottom-up in the grammar, starting from numbers first, then doing also expressions, then doing some statements, then adding more etc.

At any rate, if one grows the parser that way, it’s in my eyes important never to break again the integration, the fact that it builds, and that it does what (at that stage) the compiler is supposed to do. It’s not just early integration, is a bit like continuous integration.

At any rate, progressing like that make error localization easier: when it works properly, but after adding, say, loops to the syntax and the keywords suddenly something breaks (maybe the grammar suddenly contains conflicts), it’s more easy to know where to put the blame.

To stay continuously healthy when growing the compiler may also be less frustrating than just coding for a while until discovering that one cannot put it together into something that works. It gives a feeling of steady progress to see that larger and larger parts of the grammar are covered. And actually, each new addition gets easier and more routine anyway, so at some point one simply adds all the rest, since one has gotten a feeling how it all hangs together.

Parse only first, without AST

When letting the grammar (and the lex-specification) grow like that, one may at the same time grow the AST and perhaps the ``pretty-printer’’ alongside. That’s fine.

One can, in my opinion, also do it differently: for a start, ignore the AST (and the pretty printer) when growing the parser. That means, one leaves out the ``action part’’ of the grammar completely. Or rather one makes it to ``stub’’ returning not a a tree, but perhaps an integer 0, or nothing.

The disadvantage is that, when running the compiler on test files, one has not much output, except perhaps warnings from yacc, or when the parse fails, the parser prints some error diagnosis. Instead of doing no action at all, one might add print-statements, something at least is printed.

Once the parser works and seems happy with the test program complexaddition.cmp, one replaces the print-statements with creating ast tree nodes. Again, one may find it useful to follow a gradually growing approach, replacing not all print statements at once. Note that replacing the print-statements with AST generation requires fiddling with the cup or yacc-file: the types of the corresponding tokens have to be changed from void for print to whatever type has been chose for the AST node in a given grammar clause. For example, in the first stage (without AST), a non-terminal say program intended to represent the whole program could be declared as

non terminal            program;

Once the parsing seems to work, one can start thinking about the AST. A plausible name for the class representing the root node of the AST, i.e., the whole program, may be Program. When adding the creation of such a node in the action part of the non-terminal program, the return type is no longer “nothing”, so one needs (re-)declare the non-terminal as

non terminal Program            program;

in the cup-or yacc-file.

What one should not build gradually, I think, is to first do the lexer and then the parser. In a standard lex/yacc set-up, both scanner and parser work so close together that testing the lexer without the parser is not worth the effort (and the lexer is mostly not the source of much trouble anyway).

Further general advice

Testing

The “stay healthy and integrated while growing the project” makes sense only if one continuously tests (and builds and compiles). When new productions and clauses are added, they need to be tested, resp. the test-file(s) need to be changed, since such grammar changes change the (current state) of the language.

Versioning

Well, the handin procedure of the oblig is via git, so it’s natural to use that right from the beginning as well. It’s part of the initial early integration because it’s part of the end-product, the delivery of the compiler. Not that it’s hard, and I guess most people have worked with git or similar tools, but at any rate, you don’t want to figure out how it works at deadline time.

Versioning has other obvious advantages (I won’t preach them here). If you happen to work in a group, and a few do, it’s more or less necessary for sharing the code base anyway. The project is small enough that one can focus on simple use of git, without all bells and whistles and fancy stuff.

Whether one commits often or not is a matter of taste, but in the spirit of staying healthy, one should not commit broken versions (which are those that don’t build) after having reached a first buildable integration, at least not in the main branch. Especially not if one works in a team.

Remembering and learning from mistakes

The project is not very big, and the time is not too long. Still, sometimes some trouble occurs, and one solves it one way or the other. For instance when building the grammar: perhaps one runs into some shift-reduce or reduce-reduce conflict. Sometimes one fiddles around and it goes away, without exactly knowing why, but at least the bug is gone. Better, of course, one realizes what made the conflict go away.

And it might be good to note the problem and the solution down in a way that allows to find it again. It might not be the last time one has to deal with such a conflict. And even if the time-span of the project is not very long, one forgets faster than one would wish…

Bug tracking

This remarks is similar to the previous one: keep an overview over things. I recommended “staying healthy”, that in a way means, don’t let bugs linger around, but deal with them. Especially those that are show-stoppers, breaking the build process. But still, there may be rough edges, or things that are not solved 100%, without stopping the show. They should be dealt with, but one can deal with only one thing at a time. Some some things are more important than others, so one does one thing first and postpone the other. That’s when one should write down what needs to be done later (the more concrete and where exactly, the better, especially if one knows how to deal with it roughly already. As said, one forgets faster than one thinks, and then later, unfortunately, one has to look-into it what the hell is wrong with some thing one stumbled upon earlier. And where was the troubling issue again?

I don’t say one should use bug tracker software, the project is manageable without, but if one postpones (or ignores) relevant stuff, one should take note of that.

What if working in a group?

In a group of, say, two, one may work shoulder-at-shoulder, sitting in front of the same screen. Some recommend that 4-eyes programming style of direct interaction.

But there are other ways of collaborating. As far as the parser task is concerned, splitting up in a way that one person does the parser and one does the lexer does not make sense. The only halfway meaningful split is: parser/lexer for one, and AST (and pretty printer) for another (perhaps also splitting AST and pretty printer, but the AST must be there before the pretty printer, so it not ideal).

When saying splitting, I don’t mean complete independence, one has to coordinate and exchange information. Especially at the beginning, one needs to have the early integrated version running for all members of the group. If one prefers independent development and sharing the load, the AST + pretty printer can somehow be developed independent from the parser. Earlier I have described one can gradually develop the parser without generating and AST but having some dummy actions first.

That way, the AST and the parser development can occur at the same time, and then one has afterwards to integrate them (hoping that this does not lead to pains).

Anyway, I am not sure if I wholeheartedly recommend that, it’s also that, when programming the oblig, it’s instructive to be hands-on and familiar with the whole parser, not just one half, simply for learning how the parser part works and how the AST works. But I don’t prescribe how one internally organizes group work. The only form of “collaboration” I don’t want is, one does all the work, and both put their names under the end product… Everyone has to contribute to some meaningful extent.

If the project would be larger, for instance a semester developing a more realistic compiler, and different groups working on separate phases (one does the type checker, one does the parser, one does the data-flow etc), then the approach, where everyone has one’s finger in the details of all parts is unrealistic, then it has to be divide-and-conquer (and a lot of coordination and interfacing). But a parser for a language of the given size is small enough to be conquered without being divided.

Final remarks

I have been involved (at different universities), with quite many software projects of that kind, like “lab work” or programming projects. Sometimes compiler-related projects, sometimes other things, and often collaborative projects. Collaborative in that not everyone programs the same stuff, or maximally groups with 2 people work jointly on a common piece of code, but a number of groups or individuals collaborating on a joint larger projects, tackling different parts. That requires more “management”, planning, organization, monitoring, coordination, meetings, interfaces, and a development strategy and plan.

My remarks here are mostly based on experience with earlier rounds of the compiler course, or those other programming course work, but also on other projects as well.

Not particularly related, 3 or 4 years ago, I read the book the pragmatic programmer (or listened to the audio book version), and I quite liked it. I don’t remember exactly what advice the book had to offer, but I remember that I agreed on many things, and some of the advice from that book reminded me to things like the ones written up here for the mandatory assignments in the compiler course (or other courses). Undoubtedly the book formulates such things better and deeper and more systematically (and perhaps with more experience), and of course contains much more information.

This post here is more narrow, writing up some of my own advice I sometimes communicated, illustrating a strategy in particular for tackling the parser and lexer systematically, if one is not yet sure-footed with all practicalities that come with the task. There’s nothing wrong with not programming the parser in one blow, and not “growing” it, if one is comfortable with the task an the technology; the task itself is not so big that it requires such a cautious small-step approach. But still it may be helpful, if one is new to the task.

Regular expressions, context-free grammars, and recursion

2024-01-18T00:00:00+01:00

Regular expressions and context-free grammars

(Context-free) grammars specify the syntax of a programming language and also play a central role for parsing: the parser needs to implement the functionality to accept or reject a given input (a token stream coming from the lexer). When syntactically acceptable, the parser also typically builds up a abstract syntax tree to be handed over to subsequent compilation phases.

In their generality, context-free grammars are too expressive to be used for parsing. Thus one works in practice with restricted forms. For instance context-free grammars that can be processed by certain bottom-up parsers (typically with limited look-ahead, though unlimited look-ahead is a theoretical possibility) or by top-down parsers (again maybe with limited look-ahead).

We discussed lexing and parsing as two important early stages of a compiler. The work-load is clearly separated: the lexer deals with aspects covered by regular languages and the parser with context free languages (resp. restrictions thereof motivated by considerations of balancing being expressive vs. being still pragmatically useful and reasonably efficient). The two classes of languages correspond to two particular notations, namely regular expressions and context-free grammars (in (E)BNF notation maybe).

Those are declarative notations to specify the corresponding languages, i.e., they do not immediately correspond to a procedural execution mechanism that implements (the core of) a lexer resp. a parser. As well known, there are machine or automata models that correspond to regular and context free languages. Those are finite state automata and push-down automata. The latter are automata equipped with a stack, which is some unbounded extra memory that allows to capture the additional expressiveness needed for context-free languages and parsing.

Let’s ignore fine points here, for instance the fact that, as said, practical parsing never uses the full expressiveness of context-free languages, and hence has no need for the full power of push-down automata. In the lecture we don’t even formally define exactly such push-down automata in their generality. Though, specifically for bottom-up parsing we will encounter a construction resulting in a mechanism that uses a stack and which can consequently be understood as one particular restricted form of push-down automata. But we will just look at the construction itself and leave it at that.

Regular languages as restriction of context-free languages.

That’s easy to see. Regular languages are those describable by regular expressions, and context-free languages are those describable by context-free grammars. So to see that regular languages are a restriction of context-free one, one needs to argue that each regular expression can also be written as context-free grammar. To additionally see that it’s a real restriction would require to find at least one context-free language, that cannot be captured by a regular expression. We focus in this post on the first point and come back to the question

how to represent a regular expression as a context-free grammar

later. And it will be really easy.

Another argumentation that regular languages are a subset of context-free languages could go like this: finite-state automata are a mechanism to capture exactly regular languages and push-down-automata are a mechanism to capture exactly context-free languages. Push-down automata are nothing much else than a finite-state automaton with an additional stack. No one forces the automaton to make actual use of the additional memory, and ignoring the stack (never pushing anything or consulting the stack for decisions which step to take), turns it into a standard finite-state automaton. So it’s obvious that regular languages can be captured by push-down automata and thereby are a subset of context-free languages. To see that it’s a real restriction, similar like above, one would need to find one example of a language where using the stack is needed and where the language cannot be captured by a finite-state automaton (without stack).

Why do automata for context-free languages need a stack?

The lecture does not look into the the construction, how to turn an arbitrary CFG into a push-down automaton (we don’t even define exactly what a push-down automaton is and how it works). But still, why intuitively a stack?

Context-free grammars work with non-terminal symbols and terminal symbols. The non-terminal symbols can be seen as variables in that it does not matter how we call them. Except of course, there are smarter choices and not smart: a non-terminal intended to represent expressions is better called EXP or expr or similar, rather than X1, to help human readability, but otherwise it does not matter. The terminals correspond to tokens.

Languages (described by regular expressions, context-free grammars, or whatever) is interesting only when they are infinite. If one has a “language” consisting of a finite number of “words”, one can simply make a list of all the words or elements (and probably one would not bother to call that a language, though with the terminology of language theory it would be).

To describe a concept consisting of an infinite number of elements one needs some scheme or notation that represents repetition, like adding an element, and then another, and another etc. That means some form of recursion, or, iteration or loops etc.

Indeed, all interesting context-free grammars are recursive. If not, they would describe a finite language, and that would be uninteresting and it would be an extravagant waste to use a context-free language for that purpose. The following (small) grammar illustrates that, it’s one of the many variations of capturing expressions we encounter in the lecture:

expr   ::=  expr "+" term     | expr "-" term | term
term   ::=  term "*" factor   | factor
factor ::=  "(" expr ")"      | number 

The syntax of regular expressions in its basic form does not support variables, which in the context of context-free grammars are called non-terminals. Of course “variables” are a super-convenient mechanism. They allow (at least in declarative notations or mechanisms) to give a name to something. That’s useful in more than one aspect. One is that (properly) naming things enhances readability for humans. And it’s a form of abstraction: a name is a short form or abbreviation for something more complex. If one remembers and understands what a name stands for, one can use the name instead of the complex expression or “thing” it represents. Or maybe that’s not two different reasons for being useful, more like two aspects of the same thing.

Be it as it may, since it’s useful, extended regular expressions support “naming” (for instance tools like lex and friends). In the lecture, we had an example using quite a number of variables or names in connection with giving names to various formats involving “numbers”:

digit     ::= ["0"-"9"]
nat       ::= digit+
signednat ::= ("+"|"-")nat | nat
frac      ::= signednat ("." nat)?
float     ::= frac("E" signednat)?

We use notationally the same symbol ::= we used for the rules or productions of the context-free grammar from above.

Comparing the two definitions, there are similarities and a crucial difference. Following standard terminology for grammars, let’s call the names or variables non-terminals also for the extended regular expressions. In both examples, the left-hand sides of the equation system consists of one non-terminal, only. The right-hand sides contains words containing both non-terminals and terminals (and in the case of extended regular expressions, some other special-purpose meta-level notation, like “?” “+” etc., but that’s not relevant for the discussion here). So much about the similarities.

What’s different in the the context-free grammar clearly uses recursion (as is expected for context-free grammars): the recursion is direct, is defined in terms of itself, and indirect: is defined in terms of , which is defined in terms of which in turn is defined in terms of again. For the numbers example, that’s not the case. Non-terminals are defined in terms of other non-terminals, but not recursively: digit is used in the definition which is used in the definition if signednat, etc., it a cascade of definitions, but there is no cycle (= recursion) in the equation system.

That is also the reason why using definitions in such a (non-recursive) way will not enhance the expressive power of the regular-expression formalism. One can simply reduce some concept, say the one called signednat by replacing the non-terminals on the right-hand side (the 2 mentionings of nat in this case), but their resp. definition, and then replace the non-terminal of that definition (digit) again until all non-terminals are gone. That replacement or substitution process will find an end, since each definition only uses non-terminals defined strictly earlier (since no recursion is allowed).

Recursion needs stacks

Functions procedures calling each other (even without recursion) builds up a stack, (resp. it un-builds it when returning from a call). That reflects the LIFO discipline of calls and returns under execution. That call-stack is typically “implicit” and manage internally when the a program in the given programming language runs. The part of the compiler responsible for (the design of) that stack memory and other memory management arrangements is called the run-time environment (there will be a chapter about that, too). It’s also known that one can turn a recursive program into one without recursion; perhaps one works with a language so ancient or restricted that it does not support recursion or not even procedure calls. In that case one has to implement a stack data structure oneself, and push and pop off arguments oneself (instead of leaving that part to the run-time environment).

That brings us to one of the original question: what’s the connection of push-down automata (automata with a stack) and parser machines for context-free grammars? Grammars are recursive definitions, respectively describe recursive data structures. One particular grammar describes a tree data structure reflecting the grammar and whose instances are aptly called syntax trees. Working with recursive data structures such as trees involve recursion. That’s most visible in a top-down parsing method called recursive descent. That will be discussed later in the parts about parsing, and it involves an realization of a parser where each non-terminal is represented by a procedure responsible for parsing that particular non-terminal. For instance, in our example, there would be a procedure f-exp to parse expression, that procedure would recursively call f-expr and f-term because expressions are defined using and (and f-term would be the function to parse . Since the grammar is recursive, that will lead to a parser-implementation having a number of mutually recursive parser functions, calling each other, when presented with an input to parse. Thus, invoking the parser function leads to number of recursive calls (determined by the input) and this works with a (call-)stack

No recursion, no stack

What about regular expressions? As explained, a crucial difference between context-free grammars and regular expressions, even if one allows oneself the luxury of defining variables or non-terminals, is the absence of recursion. Indeed, if one consults a book like Compiler Construction, Principles and Practice on finds it explicitly stated that recursion is absent from regular expressions.

There’s a point to it (and so far we basically seemed to elaborate on that point). However, as we likewise said, in order to capture infinite collections of things (like languages) in a finite description, one needs a way to express repetition or recursion or iteration.

The standard notation in regular expressions for that is the Kleene star, of course. That’s not part of the core notation of context-free grammars. But one can easily achieve the same by recursion. So a regular expression like r* can be of course be capture by

A  ::=  r A

where A is a non-terminal and the production is (directly) recursive in A. If we have recursion, there’s no need for the Kleene star. So we have answered also question raised at the beginning of this text, namely a demonstration that any regular expression can be equivalently represented by a context-free grammar (the other operators are no challenge at all).

Right-recursion and iteration

Looking more carefully at the representation of Kleene star, we see that the translation is of a particular form, it’s a recursion but the non-terminal which is used recursively occurs only once and, what is more, recurs only at the end of the right hand side. Not occurring at all would be fine as all, as then there’s no recursion.

Another name for that form or recursion is right recursion or actually tail recursion! The above right-recursive example is a situation of direct right-recursion, but one can generalize that also to indirect left recursion, like when A is defined by a right-hand side with B at the tail position and B is defined by a right-hand side with A at the end, etc.

The word tail-recursion is more used for programs where functions call each other recursively, but only at the very and of the function bodies. But it’s an analogous thing.

What do we know about recursion and tail-recursion? We know that recursion in a programming language involves a stack as explained, and we know that if one has a tail-recursive situation, a good compiler or interpreter can execute the program without making use of a stack! That trick, avoiding the stack, is called tail-call optimization. In the context of some programming languages, it’s also conventional to call tail-recursive situations not recursive, but iterative (Scheme for instance), even if there’s no loop or some other iterative construct involved.

All that discussion about comparing regular expressions, finite-state automata, stack-automata and context-free languages can be summarized as follows:

The Kleene star can be seen as iterative or looping construct. It can be captured by (stack-less) finite state automata. Recursive definitions, central for context-free grammars, need automata with a stack. Right-recursive grammars are analogous to tail-recursion, and are equivalent to regular expressions (and consequently don’t need a stack).

A small final remark: What about left-recursion? Obviously the Kleene-star of regular expressions could also be captured by left- instead of right-recursion. However, there’s no corresponding recursion scheme (“head-recursion” instead of tail-recursion) as it makes no sense in general for a function to call itself recursively immediate at the beginning of its body.

Did Fortran kill a Mars Mission?

2024-01-16T00:00:00+01:00

In the lecture, we shortly brush on lexical aspects of Fortran. A quite ancient language, as far as programming languages are concerned.

We discuss some such aspects, namely keywords and in particular the treatment of whitespace in Fortran. Not because of specific interest in Fortran, but to highlight differences to more modern treatment. So to say, to learn from a “bad example” how to do it better.

Keywords

Both issues are very typical tasks of a lexer or scanner. Most languages have indeed a number of so-called keywords or reserved words (but not all languages), or maybe one should say: Fortran has keywords, but they are not at the same time reserved words. Lexemes like GOTO or FOR can get scanned into tokens for jumps resp. as part of a looping statement. But it’s also possible to use them as variables. For instance, GOTO = 5 can be a syntactically correct assignment as part of correct Fortran program. Note that in most languages, the character sequence GOTO would be acceptable as identifier (provided that capital letters are fine for identifiers), were it not that the scanner forces GOTO to be interpreted as token for a jump, not for an identifier, goto is reserved for that purpose. That’s an example of the fact, that scanners prioritize some matches over others: that’s what it ultimately means, to reserve some (key-)words for special usage, thereby disabling other uses.

Curiously, in Java, goto is a reserved but unused word, i.e., one cannot use it as variable, though there is no goto-statement in Java.

Whitespace

Another common aspect covered by lexers is whitespace. Similar to comments, white space is largely ignored: it’s not turned into a token, but the scanner jumps over the whitespace and comments until encountering subsequent non-whitespace or non-comment characters.

Being ignored in that way does not mean that whitespace is completely “meaningless”. It serves as separator between (other) lexemes. That’s in contrast with the treatment in Fortran, that. There, whitespace was not used for terminating a lexeme and thus serving as separator between lexemes. In Fortran, adding blanks or removing blanks made no difference at all.

What were they thinking?

Why it was done like that, I can only speculate. Fortran was one of the first high-level programming languages, if not the first, then at least the first “wide-spread” one. Nowadays, treating whitespace like that is seen as a bad idea, but in those pioneering days, one may not have had much experience with good conventions for scanning or other parts of a compiler. Or language pragmatics was just not top of the list of language design priorities…. Later programming languages made better choices there. Actually, a similar thing happened for written (alphabetic) languages. In antiquity, languages were often written without whitespace (and without interpunctation); there are also languages outside Europe written like that. That form of scripture is called scriptura continua, here is an example:

Only over time, people figured out that whitespace and punctuation (and perhaps the use of capital vs. lower-case letters) helped structuring the text and improve readability.

For Fortran, it may simply not have been a conscious design decision. Perhaps the first robust implementation simply did it like that, filtering out whitespace completely. It could be that for the corresponding routine it seemed just the simpler or more straightforward implementation, or the first that came to the implementor’s mind.

Once, the first compiler was out and shipped to customers along with some IBM 704s or some other early mainframes, that treatment just stuck, it was “just how it was”. Afterwards it was fixed in early programming manuals, see for instance the reference manual for Fortran on the IBM704:

Blank characters, except in column 6, are simply ignored by FORTRAN, and the programmer may use blanks freely to improve the readability of his FORTRAN listing.

Once, things like that are fixed, they won’t die out (even if most would agree it had been a bad idea). But being backward compatible and to be able to still compile older software is an important goal. Maybe even the old-timers and Fortran veterans looked down on novelty things like treating whitespace as terminators and reserving words (muttering things like “I myself prefer to write very compact code, whitespace is for wimps”).

Another example where lexical conventions refuse to die out is COBOL. When I took a COBOL course at university, one thing still stuck in my memory is that it was important how to indent stuff properly, like certain things must be indented with exactly 4 blanks etc. Programs could look like that:

IDENTIFICATION DIVISION.
  PROGRAM-ID. HELLO-WORLD.
  PROCEDURE OUTPROC.
      DISPLAY 'Hello, world'.
      STOP RUN.

One had to follow the indentation requirements slavishly, it almost seemed to me that the art of COBOL programming focused on how to properly indent (the rest was easy because it could be understood). Then I found out why one has to do it: the reason for such an indentation regime was an inheritance from the time of punch-cards. I don’t know details how punch-cards dictated indentation for COBOL; here are some comments on stack exchange on the issue. Anyway, when that became clear, I decided to drop out from that course. I had signed up for that extra course about learning COBOL out of interest, but it turned out to be all-out boring for a number of reasons, not just because of indentation fetishism. It seems like that at some universities, proper indentation is still part of grading COBOL solutions (I don’t know how up-to date those grading instructions are, though).

In our context of Chapter 2 about lexing, indentation (and tabbing, etc.) is also often treated as whitespace. However, not as in traditional COBOL, where indentation matters. There are other languages where indentation has meaning, though that may be a matter of design, not because it’s a holdover from punch-cards.

Early orbital and interplanetary NASA missions (with or without Fortran punctuation and whitespaces)

In the lecture, we discuss lexical aspects of FORTAN as potentially obscuring, at least seen with today’s eyes. There are a number of apocryphal stories around FORTRAN and failed NASA missions. Probably also concerning other languages (including assembler). Maybe these stories also simply express a certain amount of schadenfreude: the high and mighty NASA embarks on a zillion-dollar mission to boldly go where no man ever went. Or perhaps Russians had gone there recently, which one heightened the pressure to shoot the damn thing up the sky. Only to see it go BOOOMM right after lift-off. Then a meek explanation for such an extravagant firework followed claiming that everything was engineered top-notch, only the computer wizards, at one single place in a huge, otherwise correct program, had mixed up unfortunately a dot with a comma. And the public chuckled at the thought that the brainy propeller-heads at NASA, highly trained in the esoteric witchcraft of programming, cannot even use proper punctuation. It’s a good story, it was probably told and retold and written about, and in the end, it was no longer so clear what exactly happened.

But it’s known anyway, that computer glitches brought down quite more than just one space mission, thought it’s not always attributed to a single dot or comma.

So one of the stories and code snippets (often repeated and told in different contexts) indeed uses Fortran and it involves punctuation. There was a piece of control software, which contained a loop that should have looked like the following:

DO 15 I = 1,100

What actually was written instead was

DO 15 I = 1.100

That changed the meaning drastically. As discussed, scanning FORTAN means throwing away all spaces, and the program was treated as

DO15I = 1.100

And, my bad, that’s an assignment to the variable DO15I and not a loop. This (and similar) stories exist in different variations. It has been told in connection with the interplanetary mission Mariner 1, though that seems unconfirmed; the cause of the Mariner 1 debacle is mostly attributed to a different silly error, not in connection with lexing, not even in connection with FORTRAN (sometimes a hyphen is mentioned, or an overbar). As origin of the FORTRAN loop-vs-assignment glitch is mostly seen not Mariner 1, but a different space program, namely project Mercury. That is seen by many as the most plausible origin story.

See for instance the link to Ars Technica, quoting info from the RISK digest and forum. This digest, over the years, actually since decades, provides solid background information and discussions concerning computer related hazards and accidents.

Full employment theorem for compiler writers

2024-01-15T00:00:00+01:00

In the introductory chapter, as a sort of “motivation”, I mention something about an employment guarantee for compiler writers; it’s not explained much on the slides, mentioned only in the script-version, though). The correct term is

Full employment theorem (FET) for compiler writers.

There are other motivations to study compiler technology, of course, and the lecture mentions some: there will always be new languages, new platforms, there will always be a need to translate from one format to another etc. Last not least, the techniques and principles we learn in connection with compilers are often also relevant on other contexts.

The FET is of a different nature. It’s not actually a motivational statement, promising good job opportunities, but it’s actually a theorem, a mathematically proven fact with a peculiar name.

Concretely, it’s about optimization of compilers resp. optimizing compilers.

Improvement and optimizations

It goes without saying that the compiler is (supposed to be) correct. An optimizing compiler is one that makes some effort to perform well, without breaking correctness. What optimization exactly means, varies, but it generally refers to the quality of the produced code: it should run fast, be memory efficient, the code itself should be compact, etc. Beyond the quality of the produced code, there may be other criteria, like that the compilation itself is fast, but the latter is not meant by the FET.

Different legitimate optimization goals may clearly contradict each other. For example, there is often a trade-off between memory usage and speed. The theorem is also not bothered by that fact that goals may contradict each other. The argument works for all criteria that refer to the behavior of the produced code. For instance for the one of the simplest, the code size.

One can find two slightly different formulations of the FET for compiler writers. They are closely related; actually the second one is a pretty direct elaboration of the first one (and both are pretty direct elaborations of the well-known halting problem.)

One could say, the first formulation is concerned with optimizing compilers, the second one with the problem of optimizing a compiler. It’s the second version which is mentioned in the lecture.

Optimizing compiler

As mentioned, an optimizing compiler is a compiler that does efforts to produce good code, given a criterion for that. As a very simple criterion, one can take the size of the produced code. That may not be the most important criterion in practice, but the argument works independent from what one chooses as optimization target. Using the size of the program just makes the argument particularly easy.

The word ``optimizing’’ is not meant as finding optimal code. The word “optimal” means “best”, better than or at least as good as all alternatives. And optimizing means, as said, producing ``good’’ code, not optimal. A (hypothetical) compiler that produces optimal code is called fully optimizing compiler.

Full optimization in that sense is impossible, and that fact is the core of the FET for compiler writers.

The proof actually is very simple. Very simple, at least, if one makes use of a central result of computation theory, the undecidability of the halting problem. The discovery, formulation, and proof of that required a giant like Alan Turing, but the FET is a very easy consequence. It works like many such impossibility results as a proof by contradiction: If such a thing as a fully optimizing compiler existed, one could use it also to solve the halting problem. As this is not possible …. Case closed.

Slightly more explicit: assume as optimizing criterion, as said, the size of the program. Programs that the compiler wants to (fully) optimize may or may not terminate, and it’s just undecidable generally what is the case. Of course, given one particular program, it’s well possible, maybe even easy to determine whether it terminates or not. It’s just that there is no algorithm (a decision procedure), that decides the termination status for all programs.

Let’s assume the smallest possible way to achieve a non-terminating behavior is something like while true do skip, i.e., just an single non-terminating loop. Now, a fully optimizing compiler would optimize non-terminating programs into that non-terminating loop. Non-terminating programs, on the other hand, are optimized into other programs. Which case it is is easy decidable syntactically, by just looking at the optimized code.

It may be the case that the language to compile to has two or even several ways of expressing non-termination which are equally minimal. For instance, the language may support repeat skip until false which may cost the same as the while-formulation in terms of memory. But that does not invalidate the argument; if there are variations, all of them are clearly representations of an empty, infinite loop, which can be syntactically determined. It’s important that the detection is syntactic, not semantical. The question ``is a program semantically equivalent to a non-terminating program’’ is nothing else than a slightly winded formulation of the halting problem.

Optimizing a compiler

A second formulation of the FET for compiler writers takes the argument a step further. It not resigns to the fact a fully optimizing compiler is plainly impossible, but points out that there is always room for improvement:

It can be proven that for each ``optimizing compiler’’ there is another one that beats it which therefore is ``more optimal’’.

That means, any compiler, no matter how ``optimized’’ and already tuned, can be improved, so compiler writers will never be out of work (even in the unlikely event that no new programming languages or hardwares will be developed in the future…).

The proof of that elaboration of the FET is likewise easy. It goes like this. Assume you have a optimizing compiler, say Opt. It can be turned into a better one in the following way, where ``better’’ is understood in the sense that it produces smaller code. We simply take the insight from above one step further to make a tiny ``improvement’’ to Opt.

It’s clear that there are (non-trivial) programs that don’t terminate (actually, there will be infinitely many), and say NT is such a program. Then one can improve the optimizing compiler Opt by checking whether the input is (syntactically) NT and optimize that particular one by the minimal infinite-loop program (say it’s called LOOP):

  Input (P)

  if P = NT
  then output ``LOOP''
  else Opt(P)

Of course, in practice, that makes no sense!

It’s useless to improve a compiler on a case-by-case manner. Furthermore, the proof is not constructive in that it does not give a concrete construction or algorithm how to actually optimize a given compiler. Note, it also relies on identifying non-terminating programs NT. It’s easy to construct arbitrarily many non-terminating programs, one simply needs to take the non-terminating program LOOP and massage it a bit (like adding commands in the body or simply using LOOP;Q where Q is some random code). But that’s a very fake and useless form of optimization, adding a check for one hand-crafted variation of a non-terminating program after the other.

So, the compiler writer can not only point out that a given optimizing compiler can always be improved, but that such improvement cannot meaningfully performed automatically: Hence, the services of of a smart compiler writer will always be in demand. And provably so!

As a side remark: the addition of individual checks for checking if a piece of code matches a know formulation is not altogether meaningless. I resembles the way virus scanners are improved and updated (``optimized’’). It’s undecidable to decide whether a piece of code downloaded from some page, or received via email, is harmful. That’s again because semantic properties of code in non-trivial languages are all undecidable. Thus, companies maintaining virus scanners keep data bases of know viruses and their variations and mutations. And keeping a virus scanner up-to date means, adding virus signatures to the scanner, that in the meantime have been detected as harmful. The signature is just a syntactic pattern that can be used to match against a virus. A virus scanner may not be a compiler, but it uses techniques like scanning and perhaps parsing to scrutinize code.

Do only compiler writers have a guarantee of full employment?

As seen, the FET is a fancy way to state the formal impossibility of automatic optimization, in this case for compilers. Optimization is a wide field where one often faces undecidability. Hence there exist FETs for other fields, as well.

How to prepare for an oral exam?

2024-01-11T00:00:00+01:00

The standard exam for the course on functional programming is written. Except for the times under corona, where it was a variation of a written exam, a “home exam”. However, the repeat exams are regularly held as oral ones. One reason is simply expedience. There are not too many candidates who need another exam. A written exam requires a lot more preparation than an oral one, which also needs some. For a written exam, one has to come up with new questions (but not too new, they have to be comparable to those from previous semesters). Setting up a question set alone takes time and involves iterations. And the content ideally needs to be quality controlled, best actually not just read, but solved and worked through by a colleague and not any colleague, one that is more than a little familiar with the material and the course (and has time at that time of the year). The Norwegian text needs to be read and polished with the help of a Norwegian, as even with a lot of effort, I cannot reach the level of 99.5% clarity and smooth formulations and without small glitches and strange formulations. The result needs to be pressed into inspera, there needs to be a proposal for a solution and some grading instructions, to achieve uniform grading, as always at least 2 graders are involved. At least the inspera set-up need to be finished 10 days before the exam.

With the written exam scheduled for mid-January (the oral a bit later), all that needs to be done at a time of the year which is already filled up with grading the real exam, done in December and being graded till early January, resp. filled with giving feedback for the exam and preparing the new semester. That’s too much effort (especially if it’s for just a handful exams), and it’s basically not managable unless one has prepared already 2 exams in November. Consequently, many repeat exams (not only for IN2040) are oral.

In my eyes, it does not make oral exams a stopgap or nodløsning. It’s a valid form of exam for courses, independent from the number of participants. Of course with a growing number of candidates, there comes a point where an oral exam is more effort than a written one. Written exams, which may be more common at UiO, are officially called “skoleeksam”, which, in my ears, sounds strange: we are a University not a school, and an oral exam is a valid form of examination at univiersties. One should not just be able to solve some little “exercises” that one has been trained to solve, but be able to explain concepts and shed light on a larger picture. Zoom exams and home exams under corona were a nodløsning, but oral exams are not; some of course may disagree, but I think anyway there is not one single form of exam that makes sense.

Anyway, it’s not that

an oral exam is like a written one, just oral and under enormous time pressure (each oral exam is planned for a time slot of 40 minutes).

The material and the pensum is of course the same, but an oral exam examines mastery of the material for slightly different skills. Consequently preparing for an oral exam and preparing for a written one is slightly different.

About this text

This post tries to give information about the oral exam, what to expect and how to prepare. It’s my view based on my personal experience. Experience from the time when I was a student and, of course, also from the many times I was examiner resp. assisting oral exams, like taking notes, doing the protocol, or being a sensor. In the Norwegian system, there exists the role of the sensor, some extra person who is able to understand what’s being asked and answered, and who plays a more influential role than just a silent note taker or witness.

Either way, I have been part of many oral exams, both as student as well as on the other side of the table. At earlier universities were I have been, outside Norway, oral exams in the computer science and physics curriculum were more common than here (at least at that time; I don’t know if that has changed). Basically, at master level, oral was the norm, below that level, written ones were also common. But even at lower levels, and even with large lectures, orals where likewise not uncommon, besides other exam forms like project work, (pro-)seminar presentations etc… At my first university, where I myself started, at that time beginner obligatory lectures had something like 450 or more students, and that’s a lot of orals…

So I’ve been to more than a handful oral exams as student, and, starting from my PhD-times, I must have been to literally hundreds or thousands of individual oral exams for quite a number of lectures of various sorts (my own or lectures of others).

The lecture here is functional programming with quite a number of technical content (like recursion, higher-order functions, environment model, evaluation strategies etc.), and focusing on problem solving (coding). That form of content influences to some extent the style of questioning.

Another influence is the examiner. I’ve seen quite a number of examiners, with different styles how to ask questions and how to structure the exam. Some give big freedom to the candidate, some favor precisely and narrowly formulated questions (maybe even written down) and expect a quite quick, narrow (and hopefully correct) answer. The style also depends on the content of the lecture. Some lectures are more about remembering stuff that has been covered. For instance I was a number of times involved in a lecture about communications architectures and network protocol layer standards, a lecture where one had a lot to remember), other are more about understanding concepts and/or being able to solve stuff. FP is more of the latter flavor. Below I will say a bit how I structure the exam and the form of questions and answers.

What to expect from the exam?

Goal of the exam

The topic is functional programming, resp. the aspects of functional programming (and not so functional programming) in Scheme covered by the lecture and SICP. The intent is to check breadth and depth of knowledge about that, the understanding of the concepts and checking the ability to solve problems. The goals influence the design of the exam (see below). Generally, we stress the understanding aspect.

To make an example: Let’s assume during an exam there’s the question “can you sketch an example of an environment model?”

If a candidate, who has thoroughly memo(r)ized everything, remembers the details of one of the figures that has been presented in the book or on the slides, draws the corresponding boxes, balls, and arrows, then that’s fine as such. But it’s expected that one can explain what it is. If the drawing is produced together with explanations what those boxes and arrows are and what it all means, that’s perfect. If one only remembers miracously the picture, but cannot explain even when asked, that’s not worth much. That’s a bit different in a written exam. Often one is just required to produce the answer, if its correct, it’s fine, no matter if the candidate knows what’s been done and can explain it or not.

That may sounds as if oral exams are harder. On the other hand, if the given solution contains errors (the picture from the example, or or some piece of code), that typically leads to deduced points; the errors are defects in the answer. Errors are of course don’t count positive in an oral one, but no one is expected to write don’t immediately a flawless answer. In a written exam, the end-result of an answer is graded (and one has decent time work out some answer and think it though and double-check it). For an oral, there is much less time, and it’s the process of arriving at an answer or solution or approaching it. If there is some error, there might be question (intended to clarify things), like: “look again at this arrow at the procedure object, it goes from where to where?” (maybe the arrow is the wrong way around or the arrow had is not drawn etc). And if the candidate then sees the error, explains what should have been there and why etc. then the whole glitch doesn’t count negative. As said, no one is expected to provide a flawess response on first attempt and without hesitation.

Of course, it depends a bit on how much helpful questions or outright help from the examinor is needed. If a halfway decent solution cannot be reached without major assistance from the sensor (giving hints, asking helpful questions etc), if that drags on too long that counts negative, already for the fact that it takes time and the amount of material that can be covered gets less.

Design of the exam

The oral exam is a form of dialogue or interview with a fixed time. It’s also a structured or guided dialogue: the examiner asks and the candidate answers. It’s not 100% rigid, also the way the answers are given shapes the dialogue, leading to follow a up question, or resulting that the examiner gives help or hints, or tries to get to the answer by reformulating the question.

I have seem exams like this: at the beginning of the exam, the candidate is given one or maybe more than one question, and then having some minutes to think out or work out an answer or solution. During those minutes, perhaps the student is left alone to think undisturbed, before called back to present the solution.

We don’t do that, the questions won’t be of the nature that requires 10 minutes working out or solving something, resp. if it’s a question that refers to working out something, it will be like “how does one solve this-or-that”, and the intention is to see if the candidate knows how to approach the problems, which steps one would go through if one had the time, perhaps starting to sketch some steps, but mostly not carrying them through. There is simply not enough time to do that in many cases.

Covering different areas

As said, one goal is to check the breadth (to a certain extent). Of course we cannot ask everything, so there will be a selection. In other lectures, I often use 3 main sections of roughly equal duration (plus maybe a shorter general questioning section or side issues). For functional programming, it will be probably more, maybe 4 sections or even more. Each of the sections is dedicated to one topic. Once the time for that slot is up, we shift to the next part: “ok, let’s move on to a different topic, say streams. For a start, tell me ….”. Strutcuring the exam that way is similar to the written exam; also there there is a number of problems, each covering mostly a specific area (sequence-operations, tail-recursion, procedure-based objects, streams etc), and some questions arranged in a number of sub-problems,

The reason why for functional programming (unlike for other lectures) are probably 4 sections or more, not 3 and maybe not as clearly separated is that the material and the kind of material does not lend itself too well to selecting 3 topics where one can go into deep.

Inside an topical area.

Inside some topical area, we typically try to steer the question from high-level to more low-level or more detailed ones. That’s to check the depth, how deep can we go.

At least in theory, since for some lectures that works better than for our lecture about functional programming. That’s because the material has not too much “conceptual depth”. It’s not meant to say, the lecture is not challenging or complex.

But if we start a line of questioning for “recursion”, one can follow-up with “tail-recursion vs. ordinary recursion and tree recursion”, but there’s not much “deeper” we can go in this linear line of question. One can dig deeper and pose a e questions involving code or maybe too, but that’s more or less how deep we can go. That means, even if recursion is a plausible question, the “field” is too small for one “section”, and in that section there will be other mildly related questions, but not necessarily deeper. If we also ask about tree recursion, or processes etc. it’s maybe in the same section, but it’s not really deeper, it’s just another question in that general area. So the questioning goes more sideways, not deeper sometimes. But still we want to structure and plan the session (per exam) somehow, not throwing random questions (small and big, from arbitrary places in the material) at the candidate.

Generally, trying to start a line of questioning from the top is done also for psychological reasons. If one starts right away with a very specific one, the chances are higher that maybe the candidates does not know the answer; that increases the nervousness, and then one may try a littler simpler, but still the answer not really going smooth, so in the end, the candidate cannot focus on anything else than thinking that already some early question were not done well, and that can influence the rest of the topic negatively. So, better is top-down, I guess (to the extent that’s possible here for our lecture).

Another reason why strict top-down lines of questionings, even if planned, not always works is that the answers shape the questioning. It can happen that I ask a question, and perhaps it turns out difficult to answer, so one backs off, making a more high-level or more general formulation instead, or a slighty other related issues (while sticking to the general “area”). That’s no immediate reason to worry either. To partly back off is meant to have something else to talk about, partly as assisting, because one can come back to the original question afterwards. When backing off a bit and talking about something a bit less specific or something slighty else, that often brings back ideas what is meant by the original question which then one can answer. It’s not uncommon, and as long as questions are answered it does not matter in which order.

What questions to expect?

I always say:

The questions that will be asked are actually known!

Maybe not the exact wording of them. If one asks “look at this piece of code and tell me…”, the exact piece of code may not be known and vary. But apart from that, the pensum (the slides, the book, the exercises…) should give a comprehensive picture what will or can be asked.

Like: if there is a slide with header “memoizaition”, there can be a question “what’s memoization?”. If there is topical area or section called “streams”, so there can be a question “What’s a stream?”. The latter topic contains details like “delayed evaluation” and “implicit stream definitions”, so there might be in the exam the follow-up question “thanks for the explanation of streams, but can yo also explain implicit streams” (or give an example in code, or say what’s delayed evaluation is” or “We discussed memoization in the context of streams, can you elaborate? Maybe start what by saying what memoization is in general?” etc.

Being asked the original question about streams, a good way of answering it is, maybe after say what streams are in general is, to proceed by explaining also implicit streams or given an example or explain delayed evaluation. In other words: to “volunteer” additional elaboration instead of waiting until resp. if that follow-up will be asked. Remember: for most questions we don’t expect one-liners as answer, there is basically always meaningful further elaborations to add, and offering that (by continuing adding relevant related information) is good. If we think, that’s enough, let’s move on, we say so. Volunteering in this way for relevant elaboration does not only show that you know yourself what is additionally relevant for the questions, somehow getting the bigger picture and how things hang together, but (hopefully) represent also that additional material correctly. And that’s time used positively in the exam.

Of course, if one vaguely remembers, memoization was somehow discussed in connection with streams, but one cannot remember why that was and what memoization actually is, it’s a bad idea of course to volunteeringly mention “memoization” (because mentioning the word will trigger a follow-up), but rather hope the line of question stops there, or offering delayed evaluation of stream-cons instead (because one remembers that stuff). Or not elaborating on anything, waiting for whatever questions, if any, will be posed as follow-up (hopefully not memoization…).

Offering additional relevant information is also good in connection with examples. For instance saying “Let me illustrate this with a small example”, that’s often a good way to demonstrate knowledge. And this way, you have control over the example. That may be preferable over waiting until or if the examiner ask “Look at this small example, can you explain the concept with it?” Already choosing a relevant, interesting example (and not too big) shows understanding. Of course, explaining a concept on a non-self-chosen example shows also understanding.

Now, back to the original point: what questions will there be?. I said, basically the questions are known (apart from details), and I mean it like that. In an earlier university where I worked as posts-doc, there was a professor from some other chair, who was known for publishing a long list of questions before the exam (on the internet and/or on the blackboard of his group, so the students could print them or make a copy). By coincidence, his lecture was about functional programming and it used SICP (I myself was not involved in the lecture, but was a few times involved in the exams about the material). Publishing the list of potential questions sounds weirder than it is.

Similarly, when I was a student myself, the student organization had collections of questions having been asked by this or that professor for this or that course. After surviving an exam, students were encouraged to note down the questions to the extent remembered during the exam (that’s not always easy) in order to help next year’s students. Welcome were also remarks commenting on the style of exam, like “that professor wants details, be careful, I had to solve things like XXX from the exercises in detail” or “the exam focused for me mostly on general stuff, I was over-prepared remembering tiny details and notation, but I was not even asked, but it went still ok”. After noting that down, one dropped that in the post-box of the student organization (nowadays via email or an “app” or a digital “løsning” no doubt…). So, when preparing for an oral exam, a smart thing to do was to go to the office of the student organisation for computer science, borrow the collection, and make a copy of the compiled questions of the last years.

So, since everything repeats more or less, the questions were more or less known. But even if a question is known, it may still be answered well or less well. And those lists where both helpful and, actually, not so helpful. They were not so helpful insofar that in principle, what’s being asked was clear to a good extent resp. should have been clear anyway. That’s why the public list of possible questions of the mentioned professor was not such a big deal. On the other hand, the lists were helpful. Not only because they contained (sometimes pieces of) information what kind of questions would typically occur and the style of exam, but giving the feeling one knows what to expect. Especially for the early semesters, if it’s one of the first oral exams, one could perhaps avoid loosing sleep speculating what on earth could happen. Seeing a (long) list of possible questions doen’t cut down the pensum, or make it easier to understand, but still it may feel more manageable.

How to answer?

I mean, how to answer, beyond giving correct answers…

There are two points to keep in mind, one is the fact that the exam is time-limited. The second one is, that the questions almost never expect a one-liner. For illustration, assume a question “what’s tail-recursion?” and an answer

“That’s recursion at the tail!. End of message”

That’s a, well, correct one-line answer, or at least a not incorrect one, but in this particular case is of course not very insightful either, almost an empty answer. So there will be a follow up, like “can you elaborate?”, or “what do you mean by tail?” or “what’s alternatives” etc. If the response to that is “What do you mean, I should elaborate in which way, can you ask more precise?” then the next question may be among other directions “What are other forms of recursion?” or “why is tail-recursion important?” or something related, just to obtain more information in connection with the initial question and to see whether the candidate understands what has been said.

This way of prodding interaction, trying to tease out information (with extra questions, help, or hints), is not ideal. For once it makes a better impression, if one elaborates relevant aspects in a structured manner oneself. Furthermore, it wastes time. Even if in the prodding-style, every single answer would ultimately be correct, not much ground would be covered. Before asking something more detailed or deeper or something else, the time for some batch of questions is over, and we start with a new line of asking, leaving many questions unquestioned.

Scratching only at the surface or covering only little ground, even if all answers are correct (or ultimately correct after trying to reformulate questions over and over), influences the outcome negatively.

For the same reason (avoiding waste of time), one should in answering not repeat information already given. Once answered, it’s done , and normally one gets signaled, that it’s answered (“ok, thanks about this, but what about that”) and then one should not say things about “this” again, maybe in different words: saying two times the same correct thing counts positive only once, the second time it’s a waste of time.

Of course not all follow-up questions by the examiners are prodding in a negative way, in fact many are not. So being asked an additional question as follow up is not a sign of having not volunteered enough elaboration. But if the questions consume more time than answers, it’s imbalanced.

What if I (as candidate) don’t understand what’s being asked or unsure what’s expected?

In such a case, just respond by “can you repeat/reformulate the question?” Or “Do you expect me that I do or explain the following?” “Does that question refer to ….?”.

What to do if I understand the question but don’t know the answer?

Well, not ideal, but it can happen. One should avoid to panic, of course. I think it’s seldom that one is completely blank. One could either volunteer for information about (mildly) alternative and related issues (but one not already answered). Or putting it into more general context. Maybe that is accepted by the questioner, however, the original question will probably not be forgotten (“ok, thanks, that’s correct, let’s come back to the original question…”) But as long as correct and related (and not already covered) information is given, it’s not bad, better than saying nothing probably and waiting for the follow-up question which may be in the same direction.

Also, it may feel better than plainly saying “I don’t know” avoiding panic, and it may be the case, that while talking about on slight background- or side-issues in connection of the original question, in the back of the brain, the original question resp. an answer becomes clearer, and one can answer. That can be a good answering tactic, saying something relevant, but slightly off first, delaying slightly thereby and while talking the real answer comes to one’s mind. It can work. Of course, one should use it with care. So when asked about streams, one should not try an answer like “Streams are a topic in functional programming, so let me start by explaining what programming is and then functionanal programming …”. That form of digression is way off, but there is always a bit wiggling room.

What will not be asked?

No trick questions.

From time to time, one has the impression, a candidate hesitates to answer a question, not because the answer is unknown but because a trap is suspected, a trick question. If the question is “what’s memoization”, then one sometimes see a dialogue like that (exaggerating for the purpose of presentation):

A: “what you mean!?! You mean just explain what it is or the definition?”.
Q: Yes sure.
A: “You mean like explaining in words? or making an example?”
Q: “Yes, sure, whatever you prefer.”
A: “An example, is it allowed to use one from the lecture?”
Q: “Yes, sure, if you remember one from the lecture, fine, or a different one, but don’t make it too complex”.
A: “So a small example would be enough?”

In such situations, one has the impression, the candidate fears “there must be more to it, I understand what what’s being said, but that’s too obvious, I wonder what they really mean with this question, if I just say what memoization is, it’s probably a trick”.

But it’s never a trick question. It’s said, that some companies in IT use fancy questions. Microsoft especially is said to employ those as part of their recruiting (there are whole books collecting questions preparing for interviews with Microsoft or other companies that use that technique, like “how many ping-pong balls fit into an oil tanker?” or strange puzzles and brain teasers). Those questions are supposed to require imagination, improvisation, thinking on the spot and, an all time favorite “thinking out of the box”. There’s no thing as thinking out of the box at a university ;-) So questions posed are meant the most obvious way. The task is not to guess or detect the hidden meaning behind a question, it’s to answer it.

At least a question is intended to be obvious and we don’t intend to speak in riddles. Whether the question factually is obvious, however, depends also on the one being asked. But if in a question like “what’s memoization” the word “memoization” remains unclear, that would be a sign of not having studied or understood that part and does not make the question a riddle. The normal reaction in that case would not be the above dialogue, it’s more like “I don’t know the answer, I skipped that part”, or “I can’t remember details, I just remember that…”.

When we see that a question is misunderstood or the answer goes into the wrong direction, we “intervene”. So, it will not happen that a answer runs for minutes down a blind alley, and after the answer is given, we say, thanks, and note it down as answered all wrong (and having wasted precious time). So we try to correct the course, and put the answer back on track.

That does not mean, that the millisecond the answer goes wrong, we shout “stop!”. I believe, being interrupted abruptly a few times in mid-sentence can cost nerves. So we keep our horses for a short while, until the answering sentence is finished or something, and only then interfere in some way. Note that if the answer is slightly off, we might let it pass and let the explanation take its course slightly longer, even if it does not 100% fit to the question asked (but we are still happy). In that case, since we are ok with the answer anyway, this would not count negatively or as “answer not given”. We might afterwards try to come back to the original question, or maybe not.

In case the answer is correct and well-formulated, on track, and proceeds smoothly, more interesting information is added etc., then we may let it run for a short while. Still we may pose additional questions, or also try to redirect or stop the argument. Sometimes we stop, because we have seen enough, it’s all good, the candidate sure knows the answer and the field, so no need to continue. Or if the argument, while still ok, has run it’s course, and the answer starts going in circles or covering ground that is more or less explored, so does not add much new information and it becomes a bit a waste of time. So we move on.

Finally, it sometimes happens that the argumentation goes too slow. For instance, one could see that sometimes when asking for an example or when the candidate offers an example: “let me sketch it with a piece of code or a figure”. In principle, that’s all good. But then, line by line, letter by letter, parenthesis by parenthesis, a code snippet slowly unfolds on the whiteboard or paper, hesitantly checking and rechecking the parentheses. That sometimes results in a bad use of the time, a low information transmission rate, so to say, especially, if someone works on the whiteboard silently, without additionally sharing information on what is being done and why.

Anyway, being “interrupted” in one way or the other or having the course of an answer re-directed is not necessarily a sign of a wrong answer, indeed, it’s quite common.

Can I answer with stuff I know outside the pensum?

That’s quite tricky, resp. it depends. You are not expected or required to know things outside the pensum, and we don’t pose corresponding questions.

If you know material outside the pensum, that you are sure is relevant for the question, and if you are sure that the examiners can understand what you are offering or at least get the clear impression that you know what are talking about and also get the impression that your answer is relevant for the question, then one may try that. If you happen to impress the examiners with relevant extra things outside the curriculum that nonetheless fit to a question, that counts in your favor.

Having said that: this is of course not (!) an advice to read up on all kinds of extra-curriculum stuff planning for a shock-and-awe strategy, dazzle the examiners with all kinds of additional related stuff. That has a very low return-on-investment ratio and may easily backfire… If one happens to know such extra stuff for one particular question or other for whatever reason, why not.

What one should definitely avoid is to offer alternate material instead of pensum material.

This quite seldomly happens, but still one sees it happening. Like “I don’t know what a message passing is according to the lecture, but I stumbled upon an interesting article on message passing on Wikipedia/on some paper” or “I could explain it for Julia; I like that language, and you sure know it too, right?”. Sometimes, it might not be a problem (but very seldomly so). One might for instance be tempted to try to illustrate tail-recursion, abstract data types etc. also with other languages (if one happens to knows that and is shaky on the SICP coverage) . To that extent it’s partly answered (to the extent that one has shown understading of tail-recursion etc). However, the lecture discusses those concepts with Scheme and that is also part of the pensum and material. At any case, answer deviating from the pensum or from halfway conventional terminology may to the least slow down communication, it may lead to misunderstanding and all that is not good.

Actually, it does not happen offten but sometimes people seem to use “alternative” explanations or definitions as evasive tactic claiming “but in some other book, message passing is used differently”. Argumentation like that is ill-advised, at least during the exam (it happens now and then), to the very least it wastes time. And as examiner it’s normally easy to see through that, if it’s used as evasive tactics. If there is really a significantly different definition from somewhere, maybe outside SICP or computer science and someone really know that material, then it’s not even relevant, and trying to explain what the alternative definition means may be successful (and in the end the examinor believes the candidate knows what’s being talked about), but also that wastes time (and is still probably irrelevant). Anyway, when it goes into that line of answering and we are not happy with that, we intervene anyway (as explained).

How fast should I answer, how long should I think before the answer?

One should not feel obliged to blurt out an answer. Sometimes one sees candidates, they start talking before a question is even finished; there is not much gained by that. Better carefully listen to what’s being asked till the end of the question. And perhaps taking a breath while collecting one’s thoughts.

However, there is not much gained either in remaining silent for a long time, until one has found the best way to say things. There is no best answer, so no need to try to formulate one silently in one’s head, and start speaking only after the perfect one is mentally chiseled out. The only situation where I can imagine a “best” answer exists is for very precise and narrow questions: “Is this procedure tail-recursive or not?” “yes”, ‘nuff said. Actually, nothing wrong with saying something like “Let me see, here’s a procedure proc it calls itself here and here, and at this place it’s called inside a cons and therefore it’s not tail-recursive”. That’s a bit longer, but maybe even better (if not dragged out too long).

So for a question like “is this code tail-recursive or not””, one could of course say yes or no. If tail-recursion has not already been asked and answered, one could shape the answer like “Tail recursion is a specific form of recursion. It’s characterised by the fact ….blabla”. Maybe even offer its advantages, before coming to saying specifically something on the shown code. If the question is answered by a short “yes” or “no”, the follow up will anyway be (if tail-recursion has not been covered as concept already)

“why you think it’s not tail-recursive, can you elaborate, maybe start by saying what tail-recursion is?”.

How precise should my answer be, resp. how “evasive” should I answer?

Well, the more concise, the more to the point etc. the better. However, it depends also on the question. Some questions are more “loose”. So the precision of the answer should somehow fit to the precision of the question. To respond to a question

explain the concept of environment models

let me start by illustrating the notion of interpreter, because Scheme is a interpreted language, so that I can more clearly position the role of the environment model afterwards…”

is probably not a good move (besides the fact an actually environment models or run-time environments apply to compilers (and other languages, not just Scheme) as well). The reaction from the examinor will probably be, “wait a second, could you stick more closely to the question”. Offering to start by shedding light on a super-broad context feels like evading the question. And maybe hoping the question will be forgotten. Even if somehow I would let it slip, like allowing to starting with a broader context, the question will typically not be forgotten (except that in the end, time’s up, like being “saved by the bell…”). But it depends on the deviation. For instance, if the question is about streams, an answer starting like

“let me first shortly explain what evaluation strategies are, specifically what delayed vs. non-delayed evaluation means, before I clarify what it has to do with streams”

that’s is probably ok (again if evaluation strategies have not been covered already), maybe even good because it shows that one knows that streams have something to do with delayed evaluation. Trying to shed light on the even broader of “interpretation” of programming languages or similar on the other hand would stretch it. Also starting by

Let me first explain first the substitution models before I come to the environment model

feels evasive (though it shows knowledge that both model have some connection). Starting by explaining the environment model, addressing the question, and afterwards offering “this is more general than the so-called substitution model, namely in the following way…”, that’s not too bad, so one might be lucky to be allowed to continue explaining.

In general, as mentioned before, one should not ponder silently the best answer for long. Mostly there is no such thing than the best answer. Starting to say something meaningful and related things in the direction of a useful answer is preferable over remaining silent for long stretches. Silence counts for not much, saying something correct and in response to a question counts positively, even though one could have said it better or shorter or more understandable, given enough time to polish the answer. Therefore, also doing an mistake during an answer (especially if it’s about details of code) does actually count negative, everyone makes errors, provided one is able to either spot the error, resp. if the examiner points to it, recover from the error. It’s checking that you know and come up with the answer when thinking about it, not if you can know the answer by heart and immediately blurt it out fast in a stressful situation.

What kind of reactions to expect from the examiner?

I don’t have recordings of what I am saying or how I behave during the exam. So it’s just an “introspective” statement of what I intend to do and what I think I do. During an exam, I (and the sensor) must focus on the questions and answers, on what exactly is said, all concentration is on that. That’s also the reason why doing an oral exam is actually pretty exhausting (being questioned in an exam of course, as well). Anyway, one has no mental capacity to observe oneself. Afterwards, one can try to reflect on it, or the sensor remarks things (“I think your second question was not very clearly formulated” or “you should give the students more (or less) time to answer”, or whatever). But not during the exam.

Anyway, as examiner one gives feedback. Of course, when a candidate asks something like “can I illustrate it with an example?”, one says no (or more probably yes), that’s obvious.

But also without being asked there is feedback, an exam is also a dialogue, not an iterated monologue. There’s a couple of things I try to keep in mind. First, I don’t want to be too negative. I don’t want to communicate by body language, facial expressions, or words that it’s going bad, even if it is. Of course, if a question is misunderstood or an answer goes in the wrong direction, I need to try to put the answering process back on track (see the paragraph called “no dead alleys”). That’s done by words (“ok, I understand, before you continue, let me repeat and reformulate the question”), not by frowning, exchanging glances with the sensor and sharing a chuckle, or a face palm…

Actually, I have the impression, that a few candidates try to “read” the examiner, consciously or probably unconsciously. That may divert mental capacity from answering the question to the attempt to getting a feeling if the examiner is “happy with the answer”. But maybe some people have antennas for that and it’s natural and comes easy for them, I don’t know. Sometimes one sees people tentatively saying a partial answer, hesitatingly, without committing themselves, as if fishing for hints in which way to continue. I don’t know how successful it is, especially when it becomes too obvious… Anyway, try not to give too obvious body-language signals of what I think of an answer.

One the other hand, doing a complete robot-like poker-face during the exam to prevent fishing for answers is not possible. On top, it can create an uneasy atmosphere. It’s hard to talk to someone without receiving a slight nod here and there or a “Hmhm, ok, I see”. One can make people feel uncomfortable even stressful when showing no reaction at all. There’s even a name for it, it’s called the silent treatment…

So we don’t do it. The above reactions like “ok, fine” or “Hmhm, I see” are not meant as “that’s correct” or “that is what I want to hear” as answer. As bottom line, “ok” simply means, I am still following, I have heard and understood what’s being said, and if I don’t intervene beyond “ok”, then I see not need for ending that line of answering.

If I say “ok, that was correct”, or “ok, very good”, that is confirmation that the answer was correct. Actually, people mostly don’t need this confirmation to know themselves that their answer is correct; but there’s no harm in saying it anyway. On the other hand, most people are also aware while answering when the answer is not correct or evasive, or wishy-washy or delaying the real answer, or when unsure about the answer. So one does not have to explicitly state “ok, you were swimming here”, people mostly are aware of that, I think (I know that for a fact for myself). I could say “ok, let’s look more concretely at…”. But the latter could also be asked as just follow up for more information, it’s not necessarily meant as to communicate “I think you’re swimming”.

Are these hints useful in any way?

Perhaps they are, perhaps you think “ok, good to know”. On the other hand, if you think about those pieces of observations, they might actually not really useful for preparing, like giving actionable advice. They just describe behavior that I see repeatedly during oral exams, some with positive effects some with negative. But there is anyway not just one proper way of answering, different people handle dialogue differently. For instance, when saying, it’s better not to blurt out an answer before even the question is finished, but it’s also not good to remain silently for five minute before coming up with a crisp and to-the point one-liner, well, sure. But it does not give guidance like “during exam, I should collect my thought for 10 seconds, that’s the best and recommended”.

That specific advice makes no sense, and one is not graded for how many seconds it takes to start an answer, for instance. But the smoothness, structuredness and, of course, correctness of an answer counts. And of course, if every small answer takes 10 minutes, not much ground is covered, and that’s also negative. The fact that answers come super-slow is mostly a symptom of not being familiar enough with the material. So it cannot be addressed (during preparation) by training how to speak quicker, it’s addressed by learning the material better.

That answers come slow (or hesitatingly or not directly or with a lot of hints etc) may have also a slightly different reason. There is to some extent the phenomenon “I know the answer, but I don’t know how to say it” (though I maintain to understand something really means to be able to explain it). This “I cannot properly say it” effect that can be addressed, and I talk about it in the “How to prepare for the exam?”.

How to prepare for the exam?

Having discussed what to expect, the question is how to prepare for the exam. To some degree, it’s the same as would be for a written exam. The usual general advice, start-in-time, follow the material to some extent during the semester etcetc. Nothing new there.

First things first: Know your stuff!

That’s clear and generally not different from other forms of exams. There are, however differences what it means to know one’s stuff.

Drawing a parallel to written exam preparation

Written exams can be “open book” or “closed book” exams (for FP it traditionally closed book). For open-book exams, certain questions make no sense, like “what’s memoization”. But for open book exam, it obviously makes no sense to ask that question.

But even being closed book, the written exam for FP is mostly about solving problems, similar to the ones from the exercises or obligs. A collection of the written exams of previous years is also available, so one can look at the kind of questions that have being asked.

Those problem sets are intended for a 4 hour exam. The question are estimated to be solvable within 4 hours, provided one has solved or tried at least similar problems before, as preparation. Just “knowing the concepts” from the lecture without ever having done exercises oneself will probably be not good enough for a smooth sailing through the 4 hours (not even if it would be an open-book exam).

Anticipating the exam and planning the battle

Why talking about preparing written exams, when here it’s about an oral one. Because the underlying principles of how to prepare are the same. Some vary. Knowing the stuff, as said, is still the basics.

As mentioned, time is too short to solve a complete new programming task like one from a typical written exam, but still, there may be questions about “how to do this or that?”. That means one know how to address a problem, the direction of problem solving, something I sometimes called battle-plan.

Especially in a oral exam, if there are algorithmic problems to address, it’s not fancy ones, no problems that are large, or that requires some clever insight, no puzzles to be solved. It’s a bit like what I discussion also about “no trick questions”.

So there are standard problems, and one should be aware without much hesitation with what Scheme patterns one could address them (like when working at a list, one needs to do a list recursion, and one knows what typically the base case and the recursion case(s) is or are, and one has not to search long for such concepts).

One can then explain the problem, explain what steps should be taken, and why, while trying to sketch the code while taking. There is typically no time to actually code a fully a runnable solution (e.g. I will say: good enough, it’s fine). The point is to convincingly give the impression:

I can solve that, given enough time, with techniques from the lecture, and I can explain the steps it takes to do it.

Concretely solving it as in a written exam, of course also gives a convincing impression that one can solve it, but that takes too long time.

Not all or not even the majority of questions in the oral will be problem solving, there will be also conceptual questions. Like: “what’s tail recursion? What’s memoization?” Those may lead to code or “programming” problems, of course.

Conceptual question also those need (additionally) a battle-plan of a slightly different kind than the problem-solving battle plans, like “when being asked about memoization,

What concretely do I say, how do I structure my answer? Which example will I offer. If I don’t offer an example, what will I say if the examinor asks me for one. What else could I say in that connection?, Do I know what memoization is good for?

The battle plan is not reading about memoization one more time and nodding and thinking “all right, I think I get it”. It’s about being prepared for exam situation, anticipating it. Trying to concrete think about “what concrete words will I use when asked about memoization?”, even verbalizing it loud or writing it down: It’s more than . It’s good to “get it”, better is to double check “can I speak about it and explain it”. It’s like with preparing for a written exam. It’s not idea to read exam questions or exercises, read up the solution and nodding and thinking “all right, I think I get it”.

So hen preparing, one has to ask oneself “can I speak meaningful, relevant (and correct) things, for some time answering that question?”. Ideally in some structured form, like starting generally, going deeper, sketching some example etc. This may not the only way one can structure an answer, there can be others, but in general some structure is better than no structure, like hopping from one small piece of concept to another one, just in the order they pop up in the mind.

Even if they know one’s stuff, its for most not ideal if the actual exam is the very first time the words come out of the mouth. That’s what I meant that it’s a bit like

I know the thing you ask, I really do, but I never thought about how to say it, therefore I have a hard time now collecting my ideas, aligning my thoughts and actually saying it.

Some people are naturals, knowing the stuff means directly being able to lay it out in words clear and crisp on any given topic. But not for everyone. Sometimes one hears (not just in the context of exams) things like “I know it basically very well, I just cannot say it”. That dubious. I really believe: if one really knows something, one can to some extent explain it, even if it may not be elegantly formulated, one may stutter or the answer is rather unstructured, but still, one can communicate it. In an exam, a messy answer (that may additionally need lot of help) is kind of a proof that the candidate “knows” the answer, and can “explain” it, but it still counts less as a smooth explanation.

But what I believe or not is actually not too relevant: An answer like “I know it but I cannot say it or write it down” (no matter the help) is not worth much. Even if it were true, how could one know.

That form of preparation helps in more than one way. Firstly, as said, it’s typically not a good idea if the actual exam is the first time one searches for words to express something. Secondly, if one is critical to oneself, the attempt to really say things can show where one perhaps should read up a bit more. Finally, just the fact that one forces oneself to verbalize stuff in an clear way helps actually learning the stuff itself. It’s not the only way to learn it, but it contributes.

Same can be the case writing it up in one’s own words. Of course that takes time, it’s not clear if that would be an efficient use of preparation time.

One could also try to compromise: not writing up everything, but condensing a topical area into a number of items, keywords or memorizable cues. That requires focusing on the important stuff, organizing and structuring it, distilling it, perhaps writing it up with tiny handwriting to a small memo paper or sticky note. That’s of course the good old cheat-sheet technique. Organizing material in such a way is a good way of memorizing it, and one can go through the cheat-sheet memos before the exam.

Of course, using cheat-sheets in the exam is not allowed (but for an oral it does not matter as one cannot use them anyway). But writing cheat-sheets and using them to learn, I think is still allowed…

I stressed that verbalizing answers is, in my eyes, a good thing. Additionally, I think, a very good way of verbalizing it is not for oneself, but with one or more fellow students, so

Explaining concepts to others in a good way of preparing. One can even play examiner and examinee. Both profit from that. The examinee is forced to give answers, and what is being asked is controlled by someone else (the “examiner”). Also for the examiner, already listening to the answers repeats the material, and one can learn from it (“That’s a good way for answering, I should remember that for my self”). The examiner can give constructive criticism, but already a “Frankly, that was pretty confusing, I did not get it” may be helpful.

I think everyone profits from such a thing. Already going through the material (speaking or hearing) is a repetition. This form is not a replacement of first-time learning. One must have a certain level of learning progress and understanding before explaining things to each other or doing a mock exam. That’s clear, if no one has read Chapter 3, one cannot explain it to each other. Also if only one has read it but not the other, it may feel a bit unfair, so everyone should have at least some understanding.

Some remarks as reaction to the written exam 2023

The written exam, as basically always, also had some conceptual questions, This year, one small question of that type was what’s a procedure object?. That was the very first question, and there were some others later)

This question was answer disappointingly. Fact is, it was the question with the lowest percentage point score of the whole exam. To some lesser extent the other conceptual questions were answered not well.

For an oral exam, where conceptual questions that will play a larger role than in a written exam, the unability to explain things like that is problematic. In the written exams, where those questions had not many points riding on it, not much damage was done if one cannot say what a procedure object it. In an oral exam, the inability to explain things weighs heavier.

As illustration, let’s take a concept that everyone “knows”: recursion. That’s almost never asked in a written exam, one just assumes that everyone knows. Many of the coding problems in the exam will use recursion, and if one sees as grader that those problems are more or less solved, one can conclude the candidate can solve problems that involve recursion. And in that sense recursion is “understood”.

In an oral exam, one may start a line of question by just throwing in the question

What’s recursion?

maybe intended as a soft-ball, warm-up question. Being unable to answer that makes a bad impression, probably worse than having missed maybe 2 out of hundred points that such a simple question would have harvested in a written exam.

The question (also in an oral exam) is intended as an easy one that should take not much time. Therefore, the best answer itself should be not concise (= not too long, correct, and precise). Of course, as explained, one could extend the answer by volonteering to add information about tail-recursion etc., but that’s not the same as being imprecise or short.

The lecture material seldomly gives explicit definitions (as one can sometimes find in theoretical, mathematical, maybe alor also other kind of lectures (maybe from the law faculties ect). So there is no statement one-liner in bold face, preceded by the something like “definition 2.3.5” like the following that one would be expected to remember (and perhaps reproduce) as the one and only correct one: “Being recusive for a procedure means it calls itself in its body, directly or indirectly. End of official definition”

But when asked the answer(s) and the follow ups are graded how well what’s being said is correct and shows understanding of the material. And clear and concise and structured is better than, well, unclear, wishy-washy, or confusing.

What about code?

A written exam is to a large part about coding small examples. The text and advice here contained large parts about how to structure answers, how to answer, and how not and how to prepare for an oral situation. As illustrations, often I used conceptual questions (“What’s memoization?”). It reflects the fact that such conceptual questions and examining for understanding concepts plays a larger role. And that there’s no time for posing an written-exam-style coding question and wait until it has been solved.

But that does not mean that code or sketching code or understanding code does not play a role in an oral exam.

It will mostly not involve problems that need some challenging insight in the underlying problem itself. An example from the written exam might be the charity question. It had conventional patterns (let-over-lambda, procedure-based objects etc) that had been thoroughly covered, but also a twist, namely the two layers of encapsulation.

So solving it would including mastering the known patterns but applying it to a (mildly) novel situation. In the oral exam, the weight will not be on applying Scheme to really novel problems, but to more standard ones (however no garantee that all code examples show up literally on a slide or similar). Also the “coding question” may be not to code or sketch some Scheme code by the candidate, but that the examinor shows some code (which one may have seen in this or similar form), and asks to explain what the code does. The purpose is not to check if one remembers the code, but to see if one understands small pieces of code following known patterns from the lecture and one can make sense out of it conceptually. So a not-so-good answer is to “explain” things like “in the first line there is a define and it defines fac which I remember is something the lecture called factorial and then there is a newline and a parenthesis, and then an if following by more parentheses”. I am exaggarating for the purpose of illustration, but such low-level “explanations” show no deep understanding of what’s going on.

But also for possible questions or answer involving code (either when asked for code or as part of a conceptual question or when asked to explain a given piece of), one can prepare by anticipating that: “If I am asked to explain tree-recursion, and I am asked to give an example, which one would I take, and how would I sketch it and what do I say”.

(Not so) abstract syntax trees

2023-11-03T00:00:00+01:00

In weeks 11 and 12, we take a look at the meta-circular interpreter from Chapter 4 of SICP. There are different parts we discusses, the central one the mutually recursive procedure eval and apply (resp. mc-eval and mc-apply when we want to use the names as used in the slides).

But another part was dealing how to represent expressions, the syntax of “our” Scheme. In the lecture, I also called it abstract syntax, a word that also SICP briefly mentions in that context (on page 364). When browsing through the code of evaluator.scm, it all looks rather detailed and concrete and not really very abstract. Calling it “abstract” may sound puzzling:

But abstract syntax is a known concept and terminology for basically every implementation of every programming language, where by implementation we mean an interpreter (meta-circular or just a ordinary one) or a compiler for a given language. The user of the programming language normally does not to bother about what’s known as abstract syntax, only those who implement a compiler or an interpreter need to define and work with that. The standard programmer of course need to know how to write syntactically correct programs in the chosen language. The syntax the programmer experiences is called surface syntax or user syntax or concrete syntax. Actually, it’s mostly just called syntax, as users might not be aware that there is a second, internal level of syntax called abstract syntax, thus no need to call the user-level syntax “concrete”. But a interpreter- or compiler-writer distinguishes both.

Lisp or Scheme syntax is simple. Scheme is also simple in that it supports only a limited selection of constructs to the user. Of course Scheme or Lisp distributions may ship with large and complex libraries, but the core is pretty slender (a little bit of syntactic sugar, like let, notwithstanding). But that’s not the main reason why, as we said, Scheme’s concrete syntax is simple. Concrete syntax is what the user sees, abstract syntax is an internal data structure or representation of that syntax. What the user sees is the code in form of a string or a stream of characters (in one or more files). A string or similar is likewise a data structure, but it’s a very basic one, and a string is actually very unstructured in the sense that its structure (being an array of characters maybe) has nothing to do with the syntactic structure of programs in a programming language.

Also a user reading the code (represented by a string) does typically not mentally perceive the code as a large array of characters. The trained eye perceives in a piece of code a procedure, a loop, and a further loop nested in the other one that contains a case distinction followed by another loop, etc. Of course, not all strings correspond to proper programs in a given language. For instance, the following code snippet would be a string that represents a syntactically correct (part of a) JavaScript program

for (let i = 0; i < 5; i++) {
  text += "The number is " + i + "
";
}

but violates the rules for user syntax for many other languages. User-syntax is often designed to make the code easy to read for the programmer (though what exactly that entails depends on the taste, preferences and exprerience of the programmer).

So, concrete syntax is concerned with which strings are syntactically allowed representations of programs and which not. While having a program represented as a string may be useful for the programmer (having file editors and and browers at hand, and being used to read texts, even code in text form), strings are a bad idea when running or interpreting a program.

At the beginning of the lecture, we explained what happens when running a Scheme program by the so-called substitution model (that was when we were still in a purely functional setting). That’s based on substitution or replacement, and illustrated in the book or the slides by replacing step after step a formal parameter by the corresponding argument in the body of a procedure. That’s a purely textual manipulation of symbols written on the slide or the book, and thus it can be seen illustrating string manipulation (string replacement and string matching). Indeed, one could in principle come up with an interpreter that works like that, massaging the string that is writting in the langauge’s concrete syntax. That would work for a purely functional setting; for languages supporting side-effects (almost every language, that is) interpretation purely based on substitution would break down, one would need to implement structures like environments and frame. Still, one could use substitution for stepping through the code while maintaining and updating the enviroments for book-keeping of the (mutable) content of variables.

At any rate, basing interpretation on string substitution is a terrible idea, thus it’s not done. There are at least 2 reasons for that. One we have mentioned: strings as data structure are too unstructured for the purpose. Substitution is about replacing one piece of syntax by another piece of syntax. The piece of syntax being replaced in our contex is replacing a variable by an expression, for instance by another procedure in the following situation

  ((lambda (f) (lambda (x)  (/ (f x) (f (+ x 1)))))
   (lambda (y) (* y y)))

The expressions ultimately stands for the function $\frac{x^2}{(x+1)^2}$. To think of the step from the above lambda expression to the subsequent one, replacing f:

  (lambda (x)  (/ ((lambda (y) (* y y)) x) ((lambda (y) (* y y)) (+ x 1)))))

as a string-manipulation is not helpful, and it’s not what a human reader normally does. The brain, with a little training and experience, is able to perceive the parts of string as anymous procedures, and mentally operate by replacing the formal parameter by the argument procedure to understand what’s going on. Thinking of it in terms of an array of characters is just not a useful level of abstraction. Even worse would be to think of it as manipulation of a bit-array (thinking of the characters as sequences of bits). On that even lower level, a computation step could be understood as a substitution of bit-sequence inside another, though maybe one should not use the word “understand” in that contect…. We said that when reading a piece of code like the above string, one perceives them (with some training) as procedures, applications, a multiplication expression etc, one can also say, one parses them into procedures etc. That use of the word “parse” fits to the definition the word (among slightly alternative readings of the word) one finds at Merriam-webster:

to divide (a sentence) into grammatical parts and identify the parts and their relations to each other.

The grammar of a language is concerned with syntactical aspects of a language. So instead of grammatical parts, one could also say syntactic parts. So parsing a sentence is concerned with its syntax (identifying its syntactic part resp. rejecting the sentence as not following the grammatical rules governing the language). It’s not concerned with the meaning of the sentencr (or rejecting it as meaningless, but otherwise syntactically correct).

Parsing, as defined by Merriam-Webster, is a word used by linguists an describes what linguists believe what happens when someone is reading a text or a sentence in a natural language. In order to ultimately understand a sentence, one needs to identify its separate parts, subsentences, nouns, verbs, particular words and how they arrange to a full sentence. That involves figuring out where individual words start and end and determining whether a word is an actual word (maybe as listed in a dictionary). For written text, determining where a word starts and ends is relatively straightforward, as they are separated typically by some space (or maybe by a full stop or comma, which structures also sentences). For spoken languages its more complex, but separating and identifying individual words and then identifying the syntactical or grammatical structure over the words needs still to be done. As said, linguists call that parsing.

And of course, the problem exists also for programming languages and the task analysing and breaking up a text, i.e., the source code of program into, its syntactic constitutents, that task is called, as discussed parsing, and it’s done by the parser.

What the parser thereby does is taking the unstructured string which is written in the concrete syntax or user syntax, and turns it as a result of the parsing process in a data structure respresented as abstract syntax. Those data structures are trees, and they are called, unsurprisingly abstract syntax trees. For a concrete program, the root node of the abstract syntax tree represents the whole program, and each child node in that tree represents the immediate syntactic substructures. For instance, a possible tree for the JavaScript for node may have two children, one representing the loop-condition and the other one the loop-body. If JavaScript had also other kinds of loops, maybe a repeat-until-loop, the node must also contain the information, that it’s indeed a while loop and not a repeat-until loop. Structurally both syntax constructs may be analogous (two children: one a predicate, another one for the body), but for executing the tree representing the loops, one needs to distinguish them, as both kinds of loops behave differently.

So, abstract syntax refers to trees, and it’s called “abstract” as in the step from concrete syntax to abstract syntax trees, some details are typically omitted. For instance the whole “placement” of the individual characters (line numbers) is irrelevant, newlines and comments are omitted. To stick with the JavaScript example: the fact that the body of the loop is written in concrete syntax with { and } as marking the beginning and the end of the body is irrelevant (as is the fact that one has to write a ; at the end of each statement) and will typically not be represented in the AST. Those details of the concrete syntax are left out, and the abstract syntax tree only contains the syntactic essence of the program, not its concrete textual string.

Now, how is it in Scheme? Scheme famously uses parenthetic prefix notation, something rarely used in other languages. As mentioned earlier, concrete syntax is meant to aid the coder and reader to read of a piece of code, so to say to make human parsing as easy as possible. Though, as also said, what’s easy and agreeable depends on personal experience and taste and for Lisp veterans, perhaps writing and reading piles of parentheses is easy and the most agreeable way of writing programs. For parsers, on the other hand, a parenthetic structure and the Lisp syntax is most agreeable indeed. For once it’s unambiguous. If your language allows to write a numerical expression like 2 + 3 * 4, most people would understand after some calculation that this represents or results in the number 14, since one has to perform the multiplication before the addition. That’s because most users are trained to read or parse such expressions in that particular way. Such syntactic ambiguities not only exist for mathematical infix symbols, but also other programming constructs may “suffer” from that. For the trained user, it’s hopefully not a problem, but for the parser it is, it need to be programmed to parse the concrete syntax “the right way”, that reflects the intentions of the designer of the language.

For Lisp or Scheme, there’s no such problem. The parentheses make very clear which things belong together and where an expression starts (at a left ( parenthesis) and where it ends (at the corresponding right ) parenthesis). In combination with the fact that this syntax is uniform makes parsing almost trivial. Uniform means there’s no special cases like other kinds of parentheses (like (..} for this purpose and {..}, <..>, and [ ..] for others, there is no sometimes infix notation, sometimes postfix. Nore are there hardly any restrictions on what kind of constructs can be used in combination with others. For instance in Java, one cannot define methods inside methods: one can (since quite some time) nest classes, but not methods. You can have a tuple as argument to a method, but not as return result. etc. All those are syntactic restrictions, that the parser has to implement, but for Scheme or Lisp, the problem is almost trivial. Amost as long as every opening parenthesis is matched by a closing one, the program is syntactically correct.

Even more: the parenthetic notation is not only a syntactic grouping mechanism, it’s also the notation for lists. And lists can be nested, in which case SICP calls it list structures. But as we discussed in the lecture, nested lists can be seen as trees. What it basically means is that, by coding in parenthetic prefix notation, the programmer directly writes abstract syntax trees. To say it differently, there’s no real distinction between abstract and concrete syntax, we also don’t have to puzzle how to best design abstract syntax trees, they are given anyway. All that benefits the parser and parser writer, whether it benefits the user may be up to debate, some people like the unabiguity and simplicity of Scheme syntax, some may prefer other notations.

Historically, this peculiar design of the (concrete) syntax is factually sort of an accident. Lisp is quite an old language, after Fortran the second oldest programming language still in use (Cobol is the third surviving veteran from approximately the same time, but came a bit later). Scheme was a pioneering design and was in many ways ahead of its time. For instance, supporting higher-order functions and garbage collection, when Fortran, for instance, not even supported recursion. It was also developed at a time, where hardware was seriously limited and where programming was not interactive. A Wikipedia article about the IBM 704, the machine used for the first Lisp (and Fortran) implementation, gives an impression. While at it: one type of instruction format from that particular machine also gave name to the functions car and cdr, as cons-cells were implemented making use of the corresponding parts in registers (the address format had an “address” and a “decrement” part).

Not only was hardware limited, concepts and experience how to develop a programming language were missing, no prior languages existed to improve on (except machine-specific assembler) as those were the very first higher-level programming languages ever, no conceptual framework, no text books, no computer science curriculum, no nothing, just bare metal… One conceptual thing that was not yet well developed was a clear notion of how to specify and discuss the syntax of a language. That came only a bit later, in the context of Algol 60, where for the first time, the syntax of a programming language was clearly written down in a document (using what is known as a (context-free) grammar, written in so-called BNF format; such notation of grammars is a domain-specific language to describe syntax). Before that the “syntax” of a language was just what happened to be accepted by the compiler or interpreter. Of course, the developers reflected on it, and try to make good decisions. But there was also not yet a coherent body of parser technology and theory, so one had to program some program that somehow allowed to input a program (maybe from a bundle of punch cards), and then pick it up from there. The developers of early Scheme (and Fortran and other languages prior to Algol 60) would no even think explicitly about abstract syntax trees and concrete syntax trees (at least not use those words).

read

S-expressions and M-expressions

Coming back to Lisp or Scheme-syntax: the parenthetic expressions that represent the syntax of programs as well as lists are called S-expressions, and they are the concrete as well as abstract syntax for Lisp. One way of seeing it was that Lisp was and still is actually lacking user-syntax and instead let the user directly code in abstract syntax. Other languages at that time did not do that, and with time and after Algol 60, people were starting to think more systematically about how to carefully craft syntax, how to systematically parse it, and understanding what can be done by a parser and what not. There was attempts or initiatives to equip Scheme with a user-level syntax, on top of the S-expressions as notation for abstract syntax trees. This attempt is known as M-expressions, but it actually fizzled out. As McCarthy seem to indicate in History of Lisp, users of Lisp had already gotten used to programming in S-expressions, and saw no benefit in adopting a different syntax and perhaps porting the growing code base to that newer format. In that sense the language design is a historic “accidence”: the abstract syntax came first, the initial and landmark design focused on the hard problems (higher-order function etc), not on notational matters, keeping parsing trivial, and despite some (feeble?) attempts to afterwards come up with a more conventional concrete syntax, Lisp had already taken off, and it was too late.

Abstract syntax as abstract data type

Of course

 (+ 2 ()))
. #%app: missing procedure expression;
 probably originally (), which is an illegal empty application in: (#%app)
> (+ 2 ))))
2
. read-syntax: unexpected `)`
> 
read-syntax: unexpected `)`

(cons 4)
. . mcons: arity mismatch;
 the expected number of arguments does not match the given number
  expected: 2
  given: 1
  arguments...:

Y not code up $n!$ with no recursion and no Y tricks either?

2023-10-20T00:00:00+02:00

In another post, I dissected how to program recursion using anonymous functions, only. This ultimately leads to the famous $Y$-combinator. Actually, there are multiple versions of it, all doing the recursion-trick by some form of self-application. The running example in that blog post was the inevitable factorial function.

If that was not weird enough, here is a different and, arguably, even stranger way to program factorial. Namely:

without self-application, without any of the fix-point combinators like $Y$ (and of course without actual recursion and without other cheap tricks like using while or some loops that the Lisp/Scheme dialect may offer).

In the last post, the solution for factorial was straightforwardly generalizable to any recursive function, and that generalization was the $Y$-combinator (and its variations). This time, we won’t be able to generalize the construction, at least not for all recursive functions.

Intuitively that’s easy to understand. The $Y$-combinator allows to cover all recursive definitions, including those that result in non-terminating-procedures. Recursion corresponds to a higher-order functional version of capturing the essence of while-loops from imperative languages. Those can lead to non-termination as well. There exist also looping constructs that are guaranteed to terminate. Those are conventionally called for-loops. Never mind that some concrete programming languages like Java use the keyword for for general loops, including those we (and others) call while-loops, to distinguish them from their “weaker” siblings, the for-loops.

If we come up with a scheme to capture something akin to for-loops it means we cannot expect to capture non-terminating functions. But the factorial will be fine.

The $Y$-combinator (and its variations) are somewhat convoluted expressions using only (anonymous) functions applied to themselves. They can be given in the untyped $λ$-calculus, and one can program $Y$ in Scheme, though one has to be careful to take into account that Scheme is an language using eager evaluation, an aspect not typically considered when dealing with a $λ$-calculus (though of course one could focus on an eager $λ$-calculus, if one is interested in that).

The factorial function has other aspects, which are not actually part of the purest of $λ$-calculi. Pure here not in the sense of purely functional and without side-effects. Pure in the sense of “functions only!”. Remember: the first two chapters of SICP cover “building abstractions with procedures” and “building abstractions with data”. Actually the treatment of procedures comes before treating (compound) data.

Of course, without data to calculate on, there are not many things procedure can work with and compute on. One can of course define weird and powerful stuff working purely with procedures, like the $Y$-combinator, but that’s hardly a way to start a book teaching functional programming (actually SICP hardly even mentions the $Y$-combinator, it just crops up in a footnote in some exercises). Besides, when $Y$ to some function, say $F$, it still does not compute some real stuff: $Y F$ defines a function that can behave recursively, but to run that, we need to give some further input. So in order to get going, the procedures need to have some non-procedural data to operate on.

Like all programming languages, Scheme supports a number of built-in primitive data types. The most readily available ones are perhaps numbers and that’s why many initial examples in SICP are “numerical” in nature (lists, for instance comes later). Maybe one walks away with the (initial) impression that Scheme’s prime application area is number crunching. That’s actually the more or less the opposite what Lisp (and thus Scheme) was originally intended for. It was originally touted as language for “symbolic” computations, working on symbols, and the language of choice for artificial intelligence. If we take the trinity of three early, major, and surviving programming languages, Fortran, COBOL, and Lisp, Fortran was for number crunching, COBOL for “business” and managing data records, and Lisp, as said, for symbolic computations and AI.

Ok, Scheme supports numbers, but the pure $λ$-calculus does not.

In the lecture, we saw that higher-order procedures are powerful. As discussed, one can express recursion with them. Also, in connection with data structures, it was shown how pairs can be expressed by higher-order procedures. Pairs, like numbers, are built-in in Scheme, but SICP and the lecture showed, how to program the constructor and selectors for pairs (cons, car, and cdr) using procedures. Not that there would be a need for that, as they are built in, but if wished, it can be done.

Ok, then, what about (natural) numbers? At this point one may have guessed what the answer will be: yes, natural numbers can be encoded in the $λ$-calculus. At a general level, it should not be too surprising. If one has heard that the $λ$-calculus is Turing-complete, i.e., is expressive enough to compute everything a Turing-machine can compute (and thus everything that a full-blown programming language can compute), it’s implicit that somehow it must be possible (but maybe tricky).

Encoding numbers by procedures may seem like a pointless thing to do and anyway not needed (in Scheme) as they are built-in. That’s a valid standpoint, but one should also not forget that built-in is not the same a God-given. Numbers and other data structures may be built-in, but they won’t be directly processable by the ultimate hardware or platform. There will be an encoding, and it’s likewise complex. To work on standard hardware, maybe not us, but someone ultimately needs to encode numbers by $0$’s and $1$’s and to encode operations on number like addition by some manipulations of those bits. The encoding of course goes on behind the scenes (by the compiler or interpreter), and we are operating on the numbers with standard notation and operations which behave the way we are used to. But someone has to take care of the encoding to maintain the convenient abstraction for us.

Encoding the numbers (and pairs and lists) as procedures inside Scheme is of course not advisable from a standpoint of efficiency. Typical hardware can manipulate standard binary encodings of numbers fast and some basic procedures like addition may directly be supported by hardware. Other procedures (like factorial) of course not. Ultimately also they need to be represented in binary form to be run on a machine (or the interpreter that runs the and encoded as binary, in machine code). Standard hardware maybe suited to calculate basic stuff on numbers but not to juggle higher-order functions, at least not directly. Interestingly, there had been attempts to do tailor-made computer hardware for Lisp, those were known as Lisp machines (and they went the way of the dodo…)

Encoding numbers as procedures may seriously degrade performance, but it’s an interesting exercise, and it will allow to program factorial without recursion! The encoding is actually is well-known under the name Church numerals. The natural numbers is only one example of an infinite data structure, lists and trees are others that could be treated similarly. Actually, also finite data structure can be handled, for instance Booleans. All of those data structures could be encoded analogously, if one worked out the principles behind the Church numerals more clearly than we will do. The technique is also called Church encoding.

But we mostly stick to natural numbers as data structure. We approach the task from two angles: what’s the interface for numbers, and how are they represented. While we mostly stick to numbers concerning the encoding, later we will generalize beyond numbers as far as as interfaces are concerned.

The constructor interface of the encoding

The interface angle should be familiar from the lecture material about data abstraction. The goal is to have an encoding that works like the usual one (only quite a bit slower perhaps) seen from the outside. Then what belongs to the interface for numbers? And as we did in other examples, for instance when encoding pairs, the first question to answer is: how can I get natural numbers? That are the constructors of the data structure?

The two constructors of natural numbers are $0$ and $\operatorname{succ}$, maybe called zero and succ in Scheme.

Note, we are not meaning here the (built-in) number $0$, it’s meant that there are two procedures in the interface, and we call them, not unreasonably $0$ and $\mathit{succ}$ to remind us what they are intended for. We have not solved yet how to encode them properly, but once we solved the encoding, we obviously can represent all natural numbers, using that constructor interface. For instance (in Scheme) we could write

  (define seven (succ (succ (succ (succ (succ (succ (succ zero))))))))

Fair enough, that looks like a plausible way of writing down the number we pronounce “seven” in English. We mentioned that the encoding will degrade the performance. Besides that the encoding is also not very space efficient, as the above construct is a value, it’s a notation for 7. We are used to so-called positional number systems, so much so that we tend not to think about it all. For instance $137$ is a fairly compact encoding for a number which would be fairly long if we were forced to write it as with a tower of succ’s… The encoding that the built in $137$ typically uses in hardware, the binary representation with $0$’s and $1$’s, is also short, thanks to the positional representation. One could see the succ-notation as an unary number system (like tally marks, a prehistoric ``number system’’). To say it again, $0$ and $\operatorname{succ}$, resp.\ succ and zero is not the solution how to encode them as procedures, it’s the interface, the symbolic names of the two central procedures, whose internal representation we have still to figure out.

If $n$ is a function, what does that function do?

Being able to write down numbers using those constructor names is all fine and good, but in isolation it is of little use. We need to be able to do something with them, like computing with them.

So, what do we want to do with them? The most obvious thing to do is to calculate with numbers, like adding two numbers, calculating the square of numbers etc. Sure, that’s what one often does with numbers.

But perhaps we should switch perspective. Asking how to calculate with numbers as inputs and outputs, combining them in different ways is too “conventional”, too closed-minded. Remember, we are doing functional programming, where the boundary between data and procedures is blurred. In particular in a pure (non-applied) $λ$-calculus, everything is a procedure and we intend to factually encoding numbers as functions. So not only that procedures are first-class citizens in the sense that procedures can be handled in the same way as ordinary data, like serving as input or output to other procedure in the same way as numbers. It’s even more radical: Everything is in fact a function, including those that we intend to represent as natural numbers.

We learned to think of numbers not as procedures, but as data. Silly us, but that was before we learned about higher-order function… If we are about to encode numbers as procedures, we need to make up our mind (and open up our mind) what kind of procedure for instance the number (succ (succ (succ (zero))) is. If a number is represented not as a bit string or number in decimal notation or some passive piece of conventional data) but as a function, the number can be used do something when applied to an argument. So the switch in perspective is

Don’t ask what you want to do with numbers, ask what you want numbers to do for you!

Okeeeh… As everything in a purely functional setting are function, the question is what can a “procedural” natural number reasonably do when presented with a procedural argument? Church’s idea was roughly:

The number $n$ should be encoded as the function that, when given a function as input, iterates it $n$ times!

The computational essence of a number is its potential to iterate, it represents a for-loop with $n$ rounds (resp. it functional equivalent, since for-loops are structure typical for imperative languages).

One may swallow that switch of perspective, but still may want to complain a bit. It may sound interesting to see numbers are some “loop”. Interesting as that looks, aren’t we missing out an important fact. There is a good reason that standard presentations or encodings of numbers treats them as data. After all, calculating with them, like doing additions, multiplications etc., that’s certainly at least as important as having numbers as iterators, or not?

But on second thought, there is no reason not to have both. After all, having numbers are procedures does not mean they cannot also be treated data as well. Procedures are first-class citizens, right, and data is functions and functions are data.

So let’s just do some good old calculations, like addition. But what’s addition of two number other than iterating the successor function: like $7 + 4$ would be $\mathit{succ}^7 (4)$ (or the other way around), where $\mathit{succ}^7$ is meant as applying $\mathit{succ}$ 7 times to (the encoding of) $4$. Similarly, after having defined addition, multiplication can be explained as iterated addition. This way it seems, standard calculations could be achieved.

Since the sketched idea of the construction corresponds to for-loops and iteration and not to while-loops resp. general recursion, it’s obvious that there are calculations on numbers that cannot be done. Impossible are in particular functions that do not terminate (on some or all inputs). Where exactly the boundary lies, what can be represented with Church-numerals and iteration only (and without doing general recursion) is a question, we don’t explore here. But the boundary is not that all terminating functions can be represented iteratively, and only the non-terminating ones are out of reach. There is another post on the Ackerman-function and primitive-recursive function that discusses some aspects of that question.

The encoding itself

But then, what is the encoding? Actually it’s fairly easy. What we intend is a behavior as follows

\[n f = f^n\]

$n$ applied to a function $f$ corresponds to the $n$-time application of $f$. Another way of writing the same is

\[n f = \lambda x. \underbrace{f( f (f \ldots (f}_n x)))\]

The task then is to program $0$ and $\mathit{succ}$ accordingly (or zero and succ in Scheme). Here it is. Let’s start with $0$. It’s supposed to take a function to iterate $0$ times, so not at all. So $0\ f$ has no effect at all, and thus corresponds to the identify function $\lambda z. z$.

With $n$ as input, the successor returns $n+1$. Both numbers are encoded as iterators we are on the way of programming. In other words, with an $n$-iterator as input, $\mathit{succ}$ returns a function that iterates $n+1$. That leads to the following scheme:

\[\begin{array}[t]{rcl} 0 & = & \lambda s. \lambda z. z \\ \mathit{succ} & = & \lambda n. \lambda s. \lambda z. s\ (n\ s\ z) \end{array}\]

And here’s the same in Scheme

  (define zero (lambda (s) (lambda (z) z)))
  (define succ (lambda (n)
		 (lambda (s) (lambda (z)
			       (s ((n s) z))))))

The factorial

Actually, it’s straightforward. Let’s start by repeating a conventional, recursive definition of the factorial procedure

(define fac (lambda (n)
	      (if (= n 0)
		  1
		  (* n (fac (- n 1))))))

Calculating the factorial on some input $n$ means, going through the body of the function multiple times, building up a large multiplication $n \times (n-1) \ldots 2 \times 1 \times 1$ until hitting the base case. And then calculating the result $n!$. So let’s first give a name for the function that calculates one round of going through the body of the factorial, let’s call it $F$.

(define F
    (lambda (f)
      (lambda (n)
	(if (= n 0)
	    1
	    (* n (f (- n 1)))))))

The body covers the base case, but for the recursion case, it uses its functional argument $f$ for a continuation. If we pile up $n$ of those $F$’s, we can go though the body $n$ times. However, if we need to go through the body more than the number of $F$’s piled up, we fall out at the bottom, and so we need to plug in some continuation for $f$ for that case.

Of course we don’t intend to fall out at the bottom by arranging for a pile of $F$ high enough for the concrete input. In other words, it should not matter what we choose for that. Let’s just raise an error in Scheme, and call the function f0:

(define f0 (lambda (x) (error "Ouch! you should not see this...")))

This represents raising an error, and exceptions don’t fit well to pure forms of the $λ$-calculus. For instance, raising an error does not return a value, so errors don’t evaluate to something, they rather derail an ongoing evaluation. In the $λ$-calculus, non-termination is the only way to program a situation that don’t evaluate to something (and that would need recursion or something like the $Y$ combinator). At any rate, the traditional symbol for an “undefined” expression is $\bot$ (“bot” or “bottom”), and that’s what we use as well. If concerned that one would need recursion to achieve non-termination (representing being undefined), note that we simply need some function to plug in at the bottom and we don’t care which one, it can be any. So one can interpret $\bot$ also as being undefined in the sense of being not specified or arbitrary. And, as shown, in Scheme we raise an error.

As a side remark: note we have defined f0 not as (error "some message"). Doing so would not work. Remember that Scheme is an eager language, using applicative order, and without the preceding lambda, the f0 as argument would immediately raise the exception and derail the planned iteration before it even starts.

Now, to calculate $n!$, we need to iterate $F$ at least $n+1$ times, like that

\[\underbrace{F (F (F .... (F}_{n+1} \ \bot)\]

Since the church numerals exactly capture this form of iteration, we can simply write:

\[\mathit{fac} = \lambda n. ((\mathit{succ}\ n)\ F)\ n\]

Note: We would use the very same $F$, if we would use (a proper variant of) the famous $Y$-combinator instead of Church numerals, and (Y F) would give the factorial. That’s described in a different post.

What’s missing for $n!$? Or: The rest of the interface

We will not spell out the rest of the solution in full detail and only sketch what would need to be done. The above iteration of $F$ works fine, but we have not properly written up the body of $F$.

To say it differently: we have encoded or implemented $\operatorname{succ}$ and $0$, we also have hinted at that addition and multiplication can straightforwardly be defined, plus as iterated successor and multiplication as iterated addition. So we have covered the two constructors from the interface for natural numbers and we were able to do some more useful functions like $+$ and $\times$, but lacking are two other central aspects of the interface: the selectors and predicates. What we and SICP calls selectors is sometimes also called destructors (for instance in the Haskell community). One has to be a bit careful, also C++ and similar languages use the word “destructor” in connection with (object-oriented) data structures. But there it means something unrelated and has to do with object finalization and memory management, releasing memory at the end of a lifespan of some object. Functional languages, from the beginning, starting with Lisp, have automatic memory management, i.e., garbage collection, no functional programmer need to clean up personally…

Selectors are the inverse of constructors, constructors compose a larger composed structure from smaller parts, and selectors, also to access the parts from a composed one. As far as Lisp and Scheme are concerned, the constructor for pairs is cons, and the two destructors are car and cdr. For lists, the constructors are '() and cons, and the destructors are called same as for pairs. It might be clearer if one had used separate names, like left and right for the selectors for pairs and head and tail for the selectors for lists. Many (typed) functional languages would insist that two different abstract data type use a different set of constructors, so a cons (or whatever name or notation would be used for the constructor) would either construct a pair or a list, it can’t represent both constructions (even if internally both might be represented by something like cons-cells). Lisp and Scheme, being statically untyped, see nothing wrong with it.

So then what’s the selectors for natural numbers? The selectors have to give access to ``sub-structures’’ of a compound data structure. $0$ is not compound, it has no parts. So there is no selector corresponding to that. Numbers $n>0$ are structured, there are constructed as $\operatorname{succ} (n-1)$, with $n-1$ taking the role of a substructure. A good name for the destructor or selector this is $\operatorname{pred}$ for predecessor.

Of course the predecessor of $0$ is undefined, $0$ is the smallest number. Analogously the selectors for lists can be applied to non-empty lists only, and applying car or cdr to the empty list '() is undefined, resp. raises an error. So, natural numbers have one selector, the predecessor, and it is indeed the inverse of the constructor:

\[\operatorname{pred} (\operatorname{succ} n) = n \quad \operatorname{succ} (\operatorname{pred} n) = n \quad (\text{for $n>0$})\]

Note that we have not spelled out the implementation of $\operatorname{pred}$ as $λ$-expression or in Scheme. We specified its behavior in terms of how it works together with the constructors (inverting their effect). So that’s the interface contract the implementation of $\operatorname{pred}$ has to fulfill.

We won’t give an actual encoding or implementation for $\operatorname{pred}$ in our Church numerals here, it’s not really hard, but a bit more convoluted (and inefficient). If interested it can easily be found on the internet. The actual encoding is an interesting exercise, of more general interest are the underlying principles, like that the central ingredients of the structures interface are grouped into constructors and selectors/destructors with the requirement that one they are each others inverses. That principle generalizes also to other such data structure, they are generally called inductive data types.

$+$ and $\times$ are also important functions when talking about numbers. Useful as they are, they are not central to the inductive data type, they are build on top, and a natural number package could of course have very many useful functions besides $+$ and $\times$, for instance $n!$ etc.

But besides constructors and selectors, there is a third class of functions central to the interface (and used in $F$). In order to be generally useful, one needs a way of “comparing” numbers. At the very core of this is: we need to check if a number equals $0$ or not. Without that, we cannot separate the base case from the “recursion” case. As discussed at the beginning of this text, we are in a setting without the full power of recursion and we mentioned, that the selector/constructor way of presenting data structures leads to inductive data types. Thus, the recursion case(s) are alternatively also called induction or inductive case(s). Basically we doing inductive definitions, with induction being a restricted, more disciplined form of recursion, working on, well, inductive data types) Induction as proof principle (over natural numbers or over other inductively given structures) is likewise closely connected…

At any rate, for the natural numbers, the most basic form of “comparison” is the one for zero-ness. It’s a predicate that checks whether the base case applies or not. That’s exactly what we also need to program $F$ for the factorial.

With this basic predicate covered, other comparisons and useful predicates can be defined, like $n =^? m$ or also $n \leq^? m$ etc.

The check itself is encoded actually pretty simple. Remember that $0$ is an iterator, actually one that iterates it first argument function $0$-times, which means not at all, and thus gives back it’s second argument. For $n>0$, the first argument is iterated at least one time. So we have just to arrange that the second argument corresponds to true, and in all other situations we have to return false:

\[\operatorname{iszero?} = \lambda n.n\ (\lambda x.\operatorname{false})\ \operatorname{true}\]

or in Scheme.

(define iszero? (lambda (n)
		 ((n (lambda (x) #f)) #t)))

How to generalize beyond $n!$ and numbers: inductive data types and pattern matching

The Church numerals is on the one hand a perhaps strange exercise in exploring the power of higher-order functions, and not useful as actual implementation of numbers. But encoding is not “arbitrary”, if follows patterns that opens the way to likewise encode other data structures. For instance, lists could be done like that (which is another inductive data structure) or also Booleans. In the above code we allowed ourselves to use the built-in #t and #f, but we could have pushed further and encoded also Booleans. We could do so by asking the same question we started with: what can Booleans do when not seeing them as data, but as procedure (and the answer would be: make a decision between two alternatives).

The general principles behind such encodings may get buried underneath mountains of parentheses and $\lambda$’s. More interesting seems to me to focus on the interface talking about the three crucial ingredient of such structures, constructors, selectors, and basic predicates. The latter need to discriminate between various constructors, making the appropriate case distinctions.

In Scheme, when designing and implementing structured data, such as trees, etc, one of course does not do a Church encoding of those. One relies on recursion, builds say, trees, using symbols as tags, and checks the tags by properly programmed predicates. The centrally built in data structure of lists, which conceptually is an inductive data structure, of course also has the corresponding predicate called null?. So the flexibility of Scheme allows to build inductive data structures in a disciplined manner (mostly relying on the flexibility of nested lists). Not that it’s recommended, but as discussed one could even push Scheme’s flexibility to the limit and use Church’s numerals as inspiration to encode the data structures not by the built-in lists but by procedures.

Not all functional languages allowed the almost boundless flexibility of Scheme. In particular typed functional languages impose much more discipline on the programmer. For instance, the $Y$ combinator will in all likelihood no longer be programmable in a type-safe manner, and also Church tricks run into trouble and may no longer be accepted by the type system which in itself seems like not a big loss… However, the type system might easily get into the way to forbid nesting lists in flexible ways to serve as some disciplined inductive data type.

But inductive data structures are eminently important and an programming language need to support or at least allow them, without the type system getting into the way. Type functional language typically “integrate” inductive types as part of type system. After all, the structures are called abstract or inductive data types for good reason. In Scheme, without a static type level that would imposing some discipline, following some discipline is the programmer’s responsibility, coming up with a collection of procedures, perhaps grouping them personally into selectors, constructors, and predicates and conceptually think of all that as (inductive) data type. But for Scheme, it’s all procedures and a collection of lists and cons-pairs and it falls on the shoulders of the programmer to disciplined enough to use the interface, and not use directly combinations of cons, car and cdr, knowing how the interface is implemented using nested lists… Type systems supporting such data types enforce the discipline.

Using some concrete typed functional language as example, one could define the natural number inductively as follows. The code concretely is in ocaml, some ML-inspired language, but many typed functional language will support more or less similar notations.

type num =  Zero | Succ of num

Zero and Succ are the two constructors (constructors have to start with capitals, at least in ocaml), and num is the name given to the inductive data type it. We are no longer talking about of church numerals, which is mainly about encoding such inductive data structures in fancy way. We focus on the principles underlying the interface of such structures that we distilled when discussing the encoding. Of course the interpreter and the compiler will have to come up with some encoding, we don’t really care how it’s done, but we can be pretty sure, it’s not encoded by higher-order procedures …

Of course also ocaml or similar languages have numbers built in already, so we would actually of course not need to define the type num. We use it for illustration.

With the type definition we have covered the constructors of the interface. We can construct number like Succ (Succ (Succ Zero)) as representation of for $3$. Of course, the numbers won’t work as iterators (and we actually don’t miss that aspect much either). But what about selectors and predicates?

Actually, those are conventionally combined in an elegant manner in typed functional languages, namely by pattern matching. A typical use is in combination with a “case switch” to cover different shapes of the data structure, for our nats there are two cases, one base case and one inductive case (and using recursion):

let plus n m =
  match n with  // matching
    Zero -> m
  | Succ n' -> Succ ((plus n') m)

In Scheme, we used the predicate iszero? which covers the base case. The match here explicitly checks both possible cases, and note how the pattern match combines the check which case it is with the selection or deconstruction of $n$: the predecessor $n’$ of the argument is just mentioned in the matching expression.

If we would explicitly need the predecessor selector mentioned earlier, we could program it as shown below. Typically one would see no need in doing so, as its functionality is typically exploited in combination with matching and with a case distinction. Not just because it’s “elegant”, but to protect against run-time errors. Remember that pred on $0$ is an error, so it’s good practice to combine the use of pred with a prior check whether iszero? is false. And that’s exactly what the above body of plus does with the match and the case branches.

Anyway, if one needs the unsafe pred, the selector that does not care checking if there’s something to select, one could simply write

  let pred (Succ n') = n';;

The real selection functionality in done by matching the argument against the pattern Succ n', so that’s elegant enough as selection mechanism. That’s why we said above that one typically might not even see the need to give that match a name like pred.

The type system may prevent the flexibility offered by Scheme, but on the other hand it can warn us, if we have uncovered cases, for instance for the definition of pred it warns us:

Warning [partial-match]: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched: Zero

We could get rid of the warning by issuing a tailor-made error message (or doing something else for $0$).

let pred n ->
    match n with
      | Succ n' -> n'
      | _ raise (Failure "Pred of Zero undefined");;

Still, pred is only partially defined, but as long as we don’t go to negative numbers as well, something does not fit if we really want the predecessor of $0$. And as said, the selection is best done in combination with a case match covering all cases, to avoid running into those troubles.

To sum up

We have sketched the idea of Church numerals, a procedural encoding of natural numbers. Each number $n$ is represented as a function that corresponds to an iterator or a loop of length $n$. All numbers can be defined by two constructors, we could call $0$ and $\operatorname{succ}$. Since these number are actually iterators, one can use the numbers to define further useful function (by iteration). The full power of recursion can’t be done this way, all procedures will terminate, but it’s enough to program factorial.

Focusing on the interface, we stressed that besides constructors, the core of the interface of a data structure like nat involves selectors and predicates to make the necessary case distinctions. Those kind of data structures are called inductive data types. Typed languages support constructors allowing to introduce types by specifying the constructors. The functionality of selectors and predicates is achieve in a combined manner by pattern matching.

There’s still something missing

Y Y?

2023-10-09T00:00:00+02:00

This is another post in connection with some slides shown in the lecture, which may have been a bit obscure.

The text here is concretely triggered by a slide in week 9 about “recursion with anonymous procedures”. The slide showed a version of the factorial function programmed in a way unlike any we have seen before (and unlike any we will see afterwards). And programmed in a rather obscure way. The factorial is programmed without recursion, in that there’s no procedure that calls itself, at least not in an obvious way. It only uses $λ$-expressions, i.e., only anonymous functions.

Recap: Coding factorial using only anonymous functions

Let’s start out with a recursive definition of fac. fac is bound to a $λ$-abstraction, and in the procedure body, fac is mentioned and called. Probably we got used to recursive definitions meanwhile that we don’t puzzle about that too much.

(define fac
    (lambda (n)
      (if (= n 1)
	  1
	  (* n (fac (- n 1))))))

Perhaps it’s worth to point out a crucial difference between define and let. It’s not possible to define fac using let as follows:

(let ((fac
       (lambda (n)
	 (if (= n 1)
	     1
	     (* n (fac (- n 1)))))))  ;; note that fac is
                                      ;; introduced via
                                      ;; let!
   where fac is intended to be used>)

Let works similarly as define (though it has an explicitly specified scope): it binds fac to this lambda-expression. However, this time it won’t work as intended, as fac is not yet defined. If you try that example yourself in some scheme interpreter, make sure that fac has not already been defined earlier, otherwise it will look as if it worked insofar the correct value comes out. But in that case, the fac introduced via let simply calls the previously defined let, it’s not a recursive (re-)definition.

While at it: there exists a variant (not discussed in the lecture) of let which would work, it’s called letrec and that would allow an intended recursive definition of fac (and in that respect works analogous to define).

So far so good (and known from the lecture). But now we no longer want to use define to program factorial as above at least not recursively. Nor letrec obviously, nor while or other looping constructs that you favorite Scheme dialect may support (while is often supported. Additionally one can program while easily oneself (using recursion) so that would not help).

Now: let’s look at the $λ$-abstraction in isolation, i.e., the above standard definition just without giving it a name with define.

 (lambda (n)
   (if (= n 0)
       1
       (* n (fac (- n 1)))))

The base case is covered, but the branch that corresponds to the recursion case is not. For $n>0$, the body invokes fac which is undefined, resp. if it happens to be defined by coincidence from earlier, it’s probably not the factorial, as we are still struggling to get it defined. So let’s don’t rely on some unknown thing called fac coming from outside and probably undefined anyway, let’s hand over the missing continuation to cover the recursion case as functional argument:

  (lambda (f)    ;; let's refer to the whole
    (lambda (n)  ;; construction here (a higher-
      (if (= n 0);; order function) as F
	  1
	  (* n (f (- n 1))))))

NB: the term “continuation” has a specific meaning in functional programming and there exists a style of programming which is called CPS, continuation passing style. We are not claiming that the above code is strictly CPS, but there is a connection: We hand over a function that describes how to continue at the end of the function body, here at least that the one possible end that corresponds to the recursive case.

At any rate, let’s refer to the above function as $F$. Given a continuation function $f$ as argument, it corresponds to the body of the factorial.

The base case is covered, and in the recursion case, the body uses the argument $f$ to calculate the return value. Since $f$ is an argument, it can be anything, but what is needed for $n\geq 1$, where recursion should kick in, is to calculate $F$ again, this time with the numerical argument $n-1$. And going through the body of $F$ one more round would probably not be enough. So in the next-round’s recursion case, the same problem would present itself, namely how to continue just another layer of the body, and the solution would be the same yet again: do $F$ one more time, and if needed, still another round, and on and on.

That can be achieved by doing the following in the recursion case, calling $F$ and feeding to that next call to $F$ the function $F$ again, should that next round not be enough:

(* n ((F F) (- n 1)))  ;; recursion case in the body of F

Since $F$ is not only called but additionally handed over as further continuation in the next recursion case, the pattern repeats itself and the pattern can continue arbitrarily long. And for the factorial function, at least with a non-negative input, after a finite amount of repeating itself, the schema will hit the base case for $n=0$, and the correct value of $n!$ will be returned.

We are, however, not out of the woods yet. The previous code snippet mentions $F$, resp. $F F$ in the recursion case. Note that we have not officially named the higher-order function from the above Listing by the name $F$ (doing (define F ....): we agreed among ourselves to call it $F$ in the explanatory text, but not as part of the program.

We could have given the anonymous function officially the name $F$ with define, but what we discussed was that $F$ is used as argument to that function, i.e., in place for the formal parameter $f$. Besides, if we had introduced the name $F$ for the function and then used in the the recursion case, that would be a case of direct recursion using a function’s name, that’s exactly what we don’t want to do.

So: how can we use $F$ as argument to itself, without relying on direct recursion? That’s actually not hard, we just program it two times, and feed the second copy as argument to the first. However, as explained above, we need in the body something to the effect of (* n (F F) (- n 1)). That means, we need to massage the implementation $F$ in such a way, that $n-1$ is not fed as argument to $f$ (as done in $F$ and in the factorial function), but fed as argument to $f f$. That leads to the following massaged version of $F$

(lambda (f)    ;; new variant of F
   (lambda (n)
     (if (= n 0)
	 1
	 (* n ((f f) (- n 1)))))  ;; self-application of argument f

And if we apply that version to itself, we get the following function.

((lambda (f)     
   (lambda (n)
     (if (= n 0)
	 1
	 (* n ((f f) (- n 1))))))
 (lambda (f)    
   (lambda (n)
     (if (= n 0)
	 1
	(* n ((f f) (- n 1)))))))

And we are done, that’s the factorial!

One can test it easily on some input. Of course it looks a bit unelegant, so let’s clean it up a bit. We can introduce a name $F’$ for the massaged version of $F$ and using let to avoid repeating the code, and finally we can give the whole construction a conventional name, namely fac. Note that neither let nor the use of define for fac involves recursion.

(define fac (let ((F'  (lambda (f)    
			 (lambda (n)
			   (if (= n 0)
			       1
			       (* n ((f f) (- n 1))))))))
	      (F' F')))

Factorial is fine and good, but how to generalize that?

The above construction is concretely done for the factorial. Fine as it is, we are interested in doing it generally, i.e., given a recursive definition of a function and turning it to one that works without recursion. And it’s not good enough to understand the way it worked for fac, and when dealing with another recursive definition, do the same trick again for the body of that new function. A convincing generalization would be one that does not involve us, fiddling with the code, like retyping the body $F$ into the massaged version $F’$. Instead,

we want to define a Scheme procedure that takes the body $F$ and directly returns the recursive procedure that corresponds to $F$!

Also that is easy to do (kind of…), though we run into another (small) problem, at least in Scheme and similar settings.

It’s not just desirable to avoid to massage the code of $F$ in $F’$, it is necessary to do the whole trick without having access of the actual code of $F$, because $F$ is a formal parameter of the procedure. This we are forced to treat the functional argument as black box.

Turning $F$ to $F’$ without having access to the code of $F$ is actually quite easy. In the concrete factorial example, the code massage from $F$ to $F’$ “rewrites” the code so that a self-application $f\ f$ was used in $F’$ instead of $f$, as in $F$. We can achieve the same effect from the outside. Instead of feeding $F’$ into $F’$ and have $F’$’s body duplicate the argument $F’$ into a self-application $F’\ F’$, we just do the self-application outside and hand over $F\ F$ as argument.

  (lambda (f) (F (f f)))   ;; corresponds (somehow) to F'

Now we can apply that construction to itself ((lambda (f) (F (f f))) (lambda (f) (F (f f)))), doing the same trick as before in the special setting where $F$ represented the effect of the body of the factorial function. The only thing left to do is to have $F$ as argument to the construction, like the following

(lambda (F)                ;;  F as argument
  (lambda (f) (F (f f)))   ;; corresponds (somehow) to F'
  (lambda (f) (F (f f)))   ;; and is applied to itself

We may write it also in math-notation, i.e., as expression from the $λ$-calculus, and it looks like this

$\lambda F. ((\lambda f. F\ (f\ f))\ (\lambda f. F\ (f\ f)))$

[NB: the conventions for when and how to use parentheses in the $λ$-calculus are different from the conventions in Lisp or Scheme. One just has to be careful with that. For instance, if we had written above $F\ f \ f$ instead of $F\ (f\ f)$, it would look as if that corresponded to (F f f) in Scheme, but it does not; it would correspond to ((F f) f) in Scheme (and would not do the job). Just something one needs to keep in mind.]

Anyway, this expression is known in the $λ$-calculus as […drum rolls…]

the $Y$-combinator!

There are slight reformulations of that doing the same (for instance using let). And there are other such functions to achieve recursion, but doing it differently in a more serious manner, one of which we will (have to) look at.

First, let’s take the above $Y$ and try it out in Scheme, giving it its traditional name first

  (define Y (lambda (F)
    ((lambda (x) (F (x x)))
     (lambda (x) (F (x x)))))

resp. let’s use an equivalent reformulation with let, which is slightly shorter

  (define Y
    (lambda (F)
      (let ((f  (lambda (x) (F (x x)))))
	(f f))))

So, it took some meandering, we finally came up with a Scheme procedure that corresponds to the $Y$-combinator, which is known to achieve our goal: turn a procedure body like $F$ into a recursive procedure.

Then let’s reward ourselves and use it to run a version of factorial using the $Y$ combinator. Here’s again the body of the factorial from the beginning (see here):

  (define F (lambda (f)    
	      (lambda (n)
		(if (= n 0)
		    1
		    (* n (f (- n 1)))))))

and then proudly apply our $Y$ combinator to it:

  (Y F)

Ouch! That crashes the interpreter with a stack overflow. That’s bad news.

Wait a second, not so eagerly.

Crashing the interpreter is sure not desirable, but always look at the bright side: it’s good news too! The application is non-terminating, resp. in practice, it runs out of stack memory. That’s indeed a good sign, namely a sign of a recursion. Unfortunately a recursion gone wrong.

At first sight, it might be puzzling: we have encoded the famous $Y$ combinator but it does not work. As mentioned, however, $Y$ is not the only combinator to achieve the trick, there are variations of the general idea of self application.

The equation for $Y$ from above was written as term of the $λ$-calculus. Scheme can be seen as an implementation of the $λ$-calculus (with additional features needed for practical programming such as I/O etc). To be precise, there are also different $λ$-calculi, including many different typed versions, but Scheme most closely resembles an untyped $λ$-calculus.

But Scheme is a programming language, executed in a particular way, namely doing applicative order: arguments in an application need to be evaluated first before handed over in a procedure call. $λ$-calculi are often presented without fixing an evaluation strategy, resp. the evaluation strategy is left open and arbitrary. As presented in the lecture, for purely functional settings, the evaluation is based on substitution, the so-called substitution model from SICP. An expression can have multiple places where do so a substitution, i.e., multiple opportunities to apply a procedure to its argument(s), and an evaluation strategy fixes which one(s) should or could be taken. The lecture covered applicative and normal order evaluation, as the two practically relevant one for functional languages, but for the $λ$-calculus one can study more strategies (which involves where to evaluate and when to stop. Some strategies even allow multiple places in parallel or allow random choices). As a side remark, for $λ$-calculi one often speaks also of reduction strategies instead of evaluation strategies, and the basic substitution step is called a $β$-reduction step (but it’s another word for substituting the formal parameter of a function by its actual argument), and evaluation means “reducing” an expression to its value.

Scheme uses applicative order, it follows eager evaluation. And that’s the problem here. If we apply $Y$ to $F$, $F$ gets substituted into the body of $Y$, which is another (self-)application, that needs to be evaluated. After substitution, there is another (self-)application, so the process never ends, there is each time still another application as argument, and eager evaluation requires that the argument needs to be evaluated, so it never stops:

\[\begin{array}[t]{l@{\qqad}l} Y\ F & \rightarrow \\ \mathit{let}\ f = \lambda x. F (x\ x) \mathit{in}\ f\ f & \rightarrow \\ (\lambda x. F (x\ x))\ (\lambda x. F (x\ x))& \rightarrow \\ F\ ((\lambda x. F (x\ x))\ (\lambda x. F (x\ x))) & \rightarrow \\ F\ (F\ ((\lambda x. F (x\ x))\ (\lambda x. F (x\ x)))) & \rightarrow \ldots \end{array}\]

My bad, it’s recursion, but useless…

But it can be repaired. What’s needed is to delay the further evaluation of self-application argument, something like

\[\begin{array}[t]{rl} (\lambda x. F\ (\mathbf{delay}\ (x\ x)))\ (\lambda x. F\ (\mathbf{delay}\ (x\ x))) & \rightarrow \\ F\ (\mathbf{delay}\ ( \begin{array}[t]{l} (\lambda x. F\ (\mathbf{delay}\ (x\ x))) \\ (\lambda x. F\ (\mathbf{delay}\ (x\ x))))) \end{array} \end{array}\]

At that point, the argument of the outermost $F$ is not further explored, but handed over as value to $F$. After that substitution step, its an expression that looks like:

\[\begin{array}[t]{lll} \lambda n. & \mathit{if}\ & n= 0 \\ & \mathit{then}\ & 1 \\ & \mathit{else} & n \times \langle\text{self-application again (with delay)}\rangle (n-1) \end{array}\]

That’s a function that takes a number as argument, and does the body of the factorial and uses itself again as continuation in the recursion case. In particular, the body after $\lambda n$ is not further evaluated. It only starts getting into action when we provide a numerical argument. But this time, when giving a numerical argument, the recursion will stop, as at some point it will hit the base case, (at least for arguments $\geq 0$), just as the factorial does.

Now how do we do that form of delaying? Not evaluating arguments in a procedure call also underlies normal-order evaluation and the closely related notion of lazy evaluation. It also called delayed evaluation (or call-by-need), just what we are looking for. The lecture discusses two special forms delay and force in that context, but we also discuss how one can delay evaluation without relying on those built-in special forms.

It goes like this: First observe that a $λ$-expression like $\lambda x. e$ is a value, it counts as evaluated. In the $λ$-calculus, one might find places in the body $e$ where one could reduce, if one allowed substitutions to be done at any place inside an expression, not only on the top-level, but that’s not how it works in Scheme (or programming languages in general). Procedures only get evaluated when and if actually called. Now suppose that $e$ represents itself a function. It could itself be an application but after some evaluation steps it will evolve into a function. But by adding a $\lambda$ in front and applying $e$ to the formal argument $x$, we can delay the evaluation of $e$:

\[\lambda x. e\ x\]

That’s the trick that delays the evaluation of $e$ until an actual argument is provided. NB: in the $λ$-calculus, $e$ and $\lambda x. e\ x$ are said to be $η$-equivalent (“eta-equivalent”). Of course, it’s required that $e$ does not by coincidence mentions $x$ as free variable. But we can also pick another variable instead of $x$ if need be.

The trick make sense only if $e$ corresponds to a function, so (lambda (x) (1 x)) is not really meaningful. In the 100% pure and theoretical $λ$-calculus, everything is a function anyway and one needs not to worry. In Scheme $1$ is not a function, so we would have to be careful, but thankfully, the self-application $x x$ represents a function. So we can use the $η$-delay trick and write it up like that:

$Y' = \lambda F. ((\lambda x. F\ (\lambda y. x\ x\ y))\ (\lambda x. F\ (\lambda y. x\ x\ y)))$

That’s also known as the strict variation of the $Y$ combinator, as it does the job for eager functional languages like Scheme (and strict means following eager / applicative order evaluation).

And now, we are really done! For good measure, let’s just give the corresponding Scheme code.

(define Y'
  (lambda (F)
    (let ((f  (lambda (x) (F (lambda (y) ((x x) y))))))
      (f f))))

Wrapping up some loose ends

The $Y$-combinator is also called Curry’s paradoxical combinator (after Haskell Curry), and $Y$ and its variants are known as fixpoint combinators. Ultimately, those are just complicated functions or procedures, exploiting self-application in one way or the other. But why combinators? There’s no deep meaning behind it. Ultimately (and for historical reasons) a $λ$-term without free variables is called a combinator. With this terminology, (lambda (x) (* x x)) is a combinator that calculates squares as there are no free variables. Of course if we count * as free variable, which we should if we take it 100% exact, then it’s not a combinator, but let’s ignore that here and no one speaks like that anyway and says “square-combinator”…. Anyway, there are versions of the $λ$-calculus that do away with variables altogether. One cannot even write down “procedures” with formal parameters, as there are no variables at all and one is forced to work with combinators only. The calculus looks quite alien, and it’s connected to combinatory logic. Indeed, the $λ$-calculus (both typed and untyped) have roots and deep connections (also) in and to logics.

Why is it called paradoxical combinator? That has to do with said connections to logic. Curry and others invented and investigated such combinators in connection with (foundations of) logics, and $Y$ and its friends have connections to logical paradoxes.

Why fixpoint operators? As it turns out, applying $Y$ to a function (like $F$) calculates what is called a fixpoint of its argument, like a fixpoint of $F$. A fixpoint of a function as such is easy to understand: a fixpoint of $f$ is a value $a$ such that $f(a) = a$. For our specific $F$, the fixpoint of the construction results in the factorial:

\[Y\ F = f_\mathit{factorial}\]

but it’s a general observation: A recursive function can be understood as fixpoint of a function representing the effect of its body, and a $Y$-combinator calculates the proper fixpoint.

Proper fixpoint means, the smallest fixpoint though working out in which way to understand “small” and understand why it always exists and why it is uniquely defined would require more explanations and background. Fixpoints are quite interesting, for instance, there is a connection between “eager” and finite data structures which are smallest fixpoints of some construction and “lazy” and potentially infinite data structures, like streams, which are largest fixpoints. But we leave it at that (perhaps for a later post) as the text is getting longish already…