Type inference (Algos W and J …)

7 minute read

In the lecture, we discussed some type inference algorithm, an impoverished version of classical type inference. There was some remarks from the audience about “algorithm J” and “algorithm W”, both variations of the theme. I did not go into into details, especially since as my recollection of specifics of algorithm J was hazy (except that I know that it’s a lesser known variation of the more famous algorithm W). So here’s some remarks about the larger picture. It’s not a panoramic overview over type inference, there are many variations and much literature on the theme, it’s more some remarks connecting what was covered in the lecture to algo W and J, and I don’t go too detailed anyway.

Our variant from the lecture

The lecture formalized type inference. Compared to the standard ML-style type inference algorithms, our presentation is

  1. for a restricted problem (monorphic let) and it’s
  2. presented “differently”
    • generating constraints
    • rule-based presentation

The first item is a real restriction, the other issues are presentational.

Monomorphic let

That’s the real difference to the established type inference problems and corresponding algorithms. What is conventionally called algorithm W or algorithm J, those handled polymorphic let and so they addresse a more complex problem. But they make likewise use of unification. We have discussed that in the lecture and there is text in the script, so I leave it at that here.

Generated constraints and delaying unification

One key technique of the type inference algorithm and variants thereof is unification. More precisely, term unification, where types (which can include type variables) are the terms that undergo the process of unification. In many presentations, the type inference rules resp. the algorithm uses unification “on-the-fly” or “eagerly”. Eagerly in the sense that while processing the type, the rules invoke unification as subprocedure whenever eeded, i.e., when a rule requires a unification constraint being solved. The most interesting case is probably the rule for function application, but unification is used in other rules as well (but not in all). Anyway, the presenation in the lecture did not go for that eager approach. Instead of solving a unification constraint right away when processing a rule requires it, it simply collects the corresponding constraint for later, without attempting to use it solve it at that point while traversing the type (i.e., the type’s syntax tree).

That leads to a two-phase approach, where first the traversal of the type generates a set of constraints, or a constraint system, and the second phase uses unification to solve that constraint system.

In the lecture, we had a look at unification of two types, but it works analogously for a set containing pairs of types. Having that two-phase set-up may have the disadvantage that type errors are detected not right away. Perhaps also extra effort is necessay to pointpoint the place responsible for the error (for meaningfull error messages). Nonetheless, the algorithmic rule system looks easier on the eye, in one disentangles the generation of the unification constraints from solving them (via unification).

So, it’s not a real difference, and this two-phase presentation has shown up in the literature here and there.

Rule-based presentation

Perhaps using rules for that type inference algo is not unusual. However, one often finds it presened not in rules, but in some form of pseudo-code. It’s largely a matter of taste. Two arguments (in my eye) speak for a rule-based presentation (that’s why the lectures uses that). One is, often a rule-based formulations are used to suppress “negative cases”. That holds for the type system specification as well as for phrasing the algorithmic version by rules. Underlying a rule system is the understanding that it formalizes things like “this program is well-typed if there exists a derivation” and “T is the type of the program if there exists a derivation that establishes that”. And additionally and implicitly the negative case: if there does not exist a derivation, then the program is ill-typed.

If one explicitly writes it as (recursive) procedure in pseudo-code or even code, one is less inclined to just omit those cases where the procedure fails. Those “else”-cases should be covered, and in a real algorithms, adequate error messages should be generated. In rule-based presentation, one often suppreses the negative cases, and no one complains about that. Actually, one can easily also cover failing cases in a rule-based presentation, allowing to ``deriving’’ a negative reaction. It’s only that mostly one chooses to focus on the positive cases in an abstract presentation.

That’s one reason. A second reason might be the following. While a pseudo-code recursive presentation for an algorithmic version is an option, for the specification of such systems, one is better off with rules. The specification is, in most cases, given non-algorithmic and non-deterministic. It specifies when a program is well-typed, resp. what type(s) it has, namely when there exists a derivation that evidences that, but there can be many different derivations for different resulting types. It’s a nice, compact, and standard presentation of a type system.

Given a such a specification one can turn it into an algorithm. Then one has to tackle the task to establish a connection between the specification and the algorithm, namely that the algo does ``exactly’’ what the type system specified. That may be easier to establish (or at least more elegant to write down) if both the type system specification and the algorithmic version are presented similarly, namely both as rule systems.

As a caveat: when establishing the connection between a type system (for instance our type system) and the algorithmic version, we said that the algo should do the what the specification does. But, of course, that can’t be literally true. The type system can give different results for a program: that captures the polymorphic nature of the type discipline. The algorithm needs to be deterministic. So it’s not literarally true that (for a given program) a type is derivable in the specification iff it’s derivable in the algorithmic version.

The relationship is (in our case):

  • Soundness: If the algorithm derives a type, then also the type system can derive that type
  • Completeness: If the type system derives a type T, then the algo derives a type (in particular, it does not fail) of which T is a special case (substitution-wise).

In other words, the algorithm derived the most general type (if it exits). It’s a property of the type system, that such a type exists for well-typed programs, and that it’s unique. It’s connected with the fact that there exist unique most general unifier. It als gives the possible types of a well-typed program the shape of a (semi-)lattice, and algorithm determines the largest, most general element in that lattice, for a well-typed program. This most general types is also known a principle type. One can make a difference between systems having principle types or priniple typings, but those subtelties don’t bother us.

There are other forms of polymorphism, different from generic polymorphism here. One is subtype polymorphism, also called inclusion polymorphism. On some very high level, when dealing with subtyping instead of generic polymorphism, the picture is the same. The type system derives many different types: if a program has a type, it also possesses all supertypes of that. It’s generally organized in such a way that the subtyping relation is an order relation (and least-upper-bounds exists etc. and a largest element, in Java that’s Object). An algorithm for subtyping deterministically gives back the unique best element. Unlike for generic polymorphism, where the best element is the most general one, in the presence of subtyping, the best element the algo gives is the most specific one. Of course, the order relations are difference, one the first case it’s specialization via substitution, in the second case it can be seen as subset or inclusion. But still, but order relations capture specialization. One can reflect on that difference. It’s not related, however, to the may-must distinction of analyses we have discussed in connection of data-flow analysis.

W vs. J

Actually, both W and J are covered in Milner’s original article “A Theory of Type Polymorphism in Programming”. The differences are actually not big, it’s indeed variations of the “same” procedure. As stated in the paper, the second algorithm, algo J, is a “simpler” and more efficient version of algo W. The functional “language” covered in that paper is quite similar to the one from our lecture. It supports let-bindings and recursion (the latter written syntactically slightly different), and in addition to our calculus, there is product types (for tuples) and disjoint sum types. The differences between J and W are not big (and the result is equivalent). The main two differences, as stated also in the paper, are that J operates with a specific form of substitutions, and that the unification is not handled functionally, but imperatively. The specific form are idempotent substitutions (a form of substitution that generally has nicer properties than arbitrary ones). So, with a grain of salt or two, J and W are the “same”. And, as said, the main difference to the presentation of the lecture is that we simplified the algo in just covering monomorphic lets.