Unification constraints and idempotent substitutions

7 minute read

In a different post, I remarked about “algorithm W” and “algorithm J”, without going into details. There was, however, some mentioning that algorithm J, a variation on algorithm W, is based on idempotent substitutions, which also was not elaborated. I will do that here. It’s will not explicitly connected to type inference, in particular not to algorithm J.

I did not trawl the internet deeply for supporting evidence or usages of idempotent substitutions and what role they may play here and there. Thus, the text is a bit speculative. But there are presentations which a priori define substitution as idempotent functions from variables to terms. There is also a notion of a “substitution away from some set of variables”, and those “away-substitutions” are also closely connected. So it seems, the concept of idempotent substitutions mentioned in connection with algorithm J is more significant that a trick to make algo W more efficient.

Substitutions and unifiers

We are dealing with terms, in particular terms which can contain variables (in the context of type inference, the terms are the types and the variables type variables, but never mind). Substitutions are simply (finite) mappings from variables to terms, and they are considered as mappings from terms to terms, as well (a substitution is “lifted” to terms).

Unifiers are substitutions, more precisely a unifier for two terms is is a is a substitution that, when applied to the two terms, makes them identical, i.e., syntactically equal. The notion of unifier can, mutatis mutandis, also used for unifiers for sets of terms or, more importantly, for sets of pairs of terms.

Substitutions can be ordered wrt. their “generality” or “specificness”, we did that in the lecture. Additional, it’s a fact that there that there exists most general unifiers. The wording “the most general unifier for t1 and t2”, implies it is unique. Of course, there might not be a unifier for a given pair of terms, but if there are two most general unifiers, then they are equal. So, whether most general unifiers are unique depends also on what notion of equality and inequality one uses. Let’s discuss that next.

Equality of substitutions

The first reaction may be, two substitutions are the equal, well, what is there to say, when they are just equal. Substitutions are mappings from variables and they are equal if they map the same variables to the same results. That’s the most obvious definition, substitutions are equal if they are identical functions.

However, being identical is not the only valid notion of equality?1 Substitutions are functions working on syntax, they translate variables to terms resp. translate terms to terms. Syntax is often not interesting in itself, it’s used to describe other things. Sometimes syntax, like terms, is treated as being on interest in itself. Rewriting takes terms and transforms it to other terms. Also in our structural operational semantics (resembling rewriting), syntacti terms are transformed to describe steps of a program.

But still, syntax and terms are most often not interested in themselves, they are used

Back closer to the topic at hand, namely types, given as terms specified by the corresponding grammar. A type written according to that syntactic definition is almost meant literally, but not quite. The type int represent the type int (which semantically captures all integers) and nothing else, int -> int represents int -> int and nothing else (which semantically captures functions from integers to integers). There is one aspect, though, where taking the syntax of the type literally takes it too far, namely when it comes to the type variables.

Unavoidably, let’s look at the identity function. It’s written here not in our calculus, but in two different functional languages, ocaml

fun x -> x;;
- : 'a -> 'a = <fun>

and Haskell:

Prelude> :type \x -> x
\x -> x :: p -> p

The use of the type variable expresses tht the identity function is polymorphic, it works for all concrete types uniformely. However, the choice of the type variable to express that polymorphism is irrelevant. Actually, the first example chooses 'a, the second one 'p. But it’s the “same” type. It represents functions with input from some type and returning a result of the same type. It makes no sense to distinguish both types, it’s the same type. In other words, types are better not meant literally, but indepedent from the choice of the variables. This is analogous to the usual thinking that the choice of bound names are irrelevant, though the type variables 'a resp. 'p are not bound.2 Of course, we have to be careful, it’s not that the choice of variables is completely irrelevant. A type ('a -> 'b) is not the same as ('c -> 'c). Sure, it’s not that kind of irrelevance…. One can choose different names but one has to do it consistently insofar that one cannot choose to rename two different variables into the same one. Actually, renaming is nothing else than a substitution under two restrictions. It is a mapping from variables to variables (a opposed to general terms) and it’s injective, i.e., not identifying two separate variables.3 Both requirements have to do with the intention, that renaming should be undoable. If one renames the variables of a term, one can rename them back. That’s in particular important since we use the notion of renaming to state when two terms or types are equivalent, and equivalence is a symmetric notion. If a type T1 can be renamed to T2, T2 must be renamable back to T1.

[fresh variables…]

Idempotence

All nice an good, but the text here is supposed to talk about idempotence, what about that?

Let’s start by defining what idempotent means. Often, idempotence is used for binary operations, but here we are talking about idempotence of functions (namely substitutions). It simple means that $f^2$ is the same as $f$. For a substitution it means, after applying the substitution once, applying it a second time (or more) makes no difference.

That was the mathematical definition of idempotence, but that does that mean for subsitutions?

Solving unification constraints

In the lecture, we had a look at the or rather a unification algorithm, formulated for a pair of terms as input. As remarked, if one solved the problem for a pair of terms, one immediately can generalize that to a set of pairs of terms. That can be seen as a set of elementary unification constraints or a conjunction of elementary unification constraints.

The notion of constraints and , constraint systems is very broad, there are many different kinds of constraints and consequently different algorithms to deal with specific settings. Ingredients of a constraint systems are that there are ``formulas’’ with variables, and there is the notion of of solution of the constraint, which is a choice of values for the variables that solves or satisfies the given constraints. So it’s about satisfiability, logically speaking. So there is variables and ``formulas’’, the notion of solution, and, implicitly, a domain over which the constraints are solved, i.e., where from to draw potential values to solve the constrains.

We have seen earlier in the lecture solving equality or inclusion constraints over values from some lattice; more precisely, the constraints where not arbitrary equalities or inequalities, there where of a more simple form, that allowed to see the constraint system as a (pre-)fixpoint problem and use adequate techniques for that.

Unification can also be understood as constraint problem, the elementary formulas could be called unification constraints, and in the general case, one has to solve a set of them (or a big conjuction).

What’s the domain over which unification constraints are solved? That’s specific in the following sense: the values to chose from for a solution are ``formulas’’ as well, i.e., terms as well. A solution to a unification problem is a substitution which is a mapping from variables to terms, i.e., syntax.

In the following, we take a second look at how to solve unification problems. Actually, it’s not really different from the algorithm form the lecture, both give back ``the’’ MGU anyway, but it’s a slightly different look on it.

The algo works with a set of pairs of terms. This time, it’s important not to restrict to simply one pair which one recursively traverses, but really a set. That’s because an individual step of the algo simply picks one pair of the set, inpsects it, and as a reaction, replacing the chosen constraints by generating new constraints, or marking that constraint as solved, or flags a failure. So, the algorithm does not use recursion over pairs of terms accumulating substitutions, but massaging the set of constraints in a stepwise manner, simplifying them more and more. In the basic step, it’s replacing

See Wand’s article

Indeed, SAT can is a simple constraint problem there is a mathematical domain over which one solvles the constraints,

Unification and occur check

In the lecture I mentioned, in Prolog, for instance, one can switch off the occur check, to become more efficient, but under the risk of non-termination. According to Wikipedia, it seems that Prolog implementations “usually” omit the occur check…

$t_1$ $t_5$ $\mathrm{t}$ $\frac{a}{b}$.

1

2

3

Footnotes

1 What would you expect for an equality relation in general?

2 More later

3 More later