This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
(d) is unaffected when 3 and V are not involved, and his remarks in Section 7.2 are helpful for clarifying the effect of permuting cut with an application of V-left.
Comparing NJ with LJ
37
whose image under > is the result of substituting Mdi) 4>{d2) A B AAB for the indicated assumptions A A B in AAB A
d,2
±
AAB T\-AAB AAB,A\-C A T,A\-C 4>{d2) The remaining operators present no special difficulties, so the result follows by a trivial induction on the number of inferences below the cut. For Theorem 2.2, let us suppose first that II terminates with an elimination whose major premise belongs to a maximal segment (call such an inference a crucial elimination). We prove by induction on the length of the segment that 0'(II) terminates with an essential cut. If the length is 1, the cut in question falls under case (3) (or case (2a) with the same proviso as before). This is just a matter of checking the various cases. For example, if the last inference of II is an application of A-elimination and II has the form shown on the left below, >f(U) will be as shown on the right:
ni
n2
A. B_ AAB A
0'(no /(n2) r h A A\- B A\- A r,AhiAB A/\B\-A r, A h A
If AAB had been introduced into II by an application of the intuitionistic negation rule, the case would be exactly analogous, with thinning taking the place of A-right in >'(II). Now suppose that the segment has length n + 1 and let IT' be obtained by permuting the last inference of II with the application of V- or 3-elimination immediately above it, then <j)'(JI') is obtained by permuting the final cut of >'(II) with the application of Vor 3-left whose conclusion is its left premise. Hence, the induction step follows by applying the induction hypothesis to the subderivation of II' which terminates with a crucial elimination whose major premise belongs to a segment of length n. The theorem now follows, in one direction, by a trivial induction on the number of inferences below the last crucial elimination of a non-normal derivation.
38
Normalization, Cut-Elimination and the Theory of Proofs
The proof of the converse is scarcely more interesting. A formula A occurring on the right of a sequent in a derivation d is said to have been introduced by axioms if d is an axiom, or the last inference 71 of d is an application of a left rule or cut and every occurrence of A on the right in a premise of TZ was introduced by axioms. The following can then be proved by induction on the construction of normal derivations: Lemma 2.4 For normal U, if there is no main route through U which terminates with an introduction segment, then the formula on the right of the conclusion of 4>'(J1) was introduced by axioms. It is also a straightforward matter to prove by induction on the construction of d\\ Lemma 2.5 If d is of the form d\
d2
r,AhB where A was introduced by axioms in the conclusion of d\, then the last cut of d is inessential. The desired result now follows from Lemmas 2.4 and 2.5 by induction on normal II. There is nothing to show except when the last inference of II is an elimination. In this case, the last inference of >'(II) will be a cut, and the derivation of its left premise will be ^ ( I T ) , where II7 is the derivation of the major premise of the last inference of II. Since II is normal, it follows (by Lemma 2.4) that the cut formula in the conclusion of ^'(n') was introduced by axioms and, hence, (by Lemma 2.5) that the cut is inessential. Theorems 2.1 and 2.2 establish a correspondence between normal derivations and (almost) cut-free ones (for intuitionistic logic, at least), but they fall short of establishing a correspondence between the two versions of the Hauptsatz. The cut-elimination theorem is not simply a demonstration of the completeness of the cut-free rules. If it were, it could be proved more simply by semantic methods. In fact, one of the most attractive features of LK is that its rules, minus cut, can be shown to be complete in a very straightforward way.11 The proof exploits the fact that each logical rule can be read backwards as an attempt to refute its conclusion. Given a sequent, the rules are applied backwards in an effort to find an interpretation which falsifies all the formulae on the right while verifying those on the left. The result will be a search tree. A branch of the tree is closed if the same formula appears on both the left and right side of its final sequent. Now, if the tree terminates with all its branches closed, it can be transformed into a derivation of the initial sequent simply by inverting it. On 11 Since it is obvious that cut is classically valid, this provides a rather direct proof of the adequacy of LK.
C o m p a r i n g N J with L J
39
the other hand, if it does not, a counterexample to the initial sequent can be extracted from it. 12 What is missing from this proof of 'cut elimination' is the association of a particular cut-free form with each derivation. Whether this is regarded as a significant loss depends upon one's point of view. Many applications of the theorem remain unaffected. For these one needs only that cut-free derivations have the subformula property and that relatively weak principles are sufficient to show, for every derivation, the existence of a cut-free one with the same conclusion.13 It is sometimes claimed, however, that Gentzen's Hauptsatz is the first major theorem of general proof theory, the study of the notion of proof for its own sake—without restriction on the principles employed and without regard to possible applications. Although its significance may not be very well understood, it is thought to have an intrinsic interest beyond its applications to foundational problems. Furthermore, although it deals only with the combinatorial properties of formal derivations, its interest is supposed to stem from what it tells us about the proofs represented by these derivations. The analogy between derivations and terms mentioned at the end of the previous chapter, together with the strong normalization theorem, suggests that reduction steps for derivations may be like rules for computing the value of a function. On this analogy a derivation and its normal (or cut-free) form represent the same proof, the latter perhaps representing it in a particularly direct way (just as numerals, the normal forms of closed arithmetical terms, are related in a direct way to the natural numbers which they denote). Implicit in this view is the idea that it is not merely the existence of normal (or cut-free) forms which is of interest, but also the procedure by which a derivation can be reduced to such a form. After all, to refer to a derivation d! as a normal form of d is to imply that d and d! are related by something more than their common assumptions and conclusion (since there are in general many other normal derivations which share these features of d' and which cannot be called a form of d in any sense); what distinguishes d' is that by carrying out a specific procedure d can be transformed into d'. It is not surprising, therefore, that the rough account of the relationship between the cut-elimination and the normal form theorems given above has been supplemented by a more refined analysis which compares the actual cut-elimination procedure for LJ with the normalization procedure for NJ. The most notable effort in this direction is to be 12 A good account of a completeness proof along these lines can be found in Chapter VI of Mathematical Logic by S.C Kleene, New York, 1967. To facilitate the proof, he considers a system G 4 which differs from LK in some minor respects. 13 See Kreisel's remarks on pages 329 and 364 of his "A Survey of Proof Theory," Journal of Symbolic Logic, Vol. 33, 1968, as well as the first section of his "A Survey of Proof Theory II," Proceedings of the Second Scandinavian Logic Symposium ed. by J.E. Fenstad, Amsterdam, 1971.
40
Normalization, Cut-Elimination and the Theory of Proofs
found in Zucker's paper, "The Correspondence Between Cut-Elimination and Normalization," cited above. Zucker considers first the negative fragments NJ~ and LJ~ of these calculi. In LJ~ he distinguishes two kinds of conversion step: (1) Permutative Conversions—These allow cuts to be permuted upwards past other inferences, contractions to be permuted downwards past other inferences and trivial cuts to be eliminated. (2) Essential Conversions—These replace cuts by cuts of lower degree. In NJ~, of course, there is only one kind of conversion, namely removing a maximal formula occurrence. The equivalence relation = on the derivations of LJ~ generated by the permutative conversions he calls strong equivalence and proves: Theorem 2.6 For all derivations d,d' of LJ~, d = d' iff >(d) = (/)(df). Let d, d',... range over the derivations of ZJ~, and II, I I ' , . . . over those of NJ~. d y\ df means that d' is obtained from d by applying an essential conversion, and II y\ II' means that II' is obtained from II by replacing a redex by its contractum. y is the transitive closure of >-i. Theorem 2.7 (Zucker) (1) Ifd yx d', then 0(d) y
Comparing NJ with LJ
41
He also points out that these results in turn imply their analogues for NJ~ so that strong normalization for NJ~ is equivalent to strong cut-elimination for LJ~, and uniqueness of normal forms for the derivations of NJ" is equivalent to the uniqueness of cut-free forms (up to strong equivalence) for LJ~~ derivations. These results are described in some detail because they are paradigms of the kind of results we would like to obtain for NJ and LJ. They conform to our expectations about the relationship between the sequent calculus and natural deduction as expressed by the mapping
A Am
L
*%
di m^
A
B
d Bj,Bj h C
AmhC
(I have omitted the additional formulae which may appear on the left of the conclusions of d, d1? etc., for the sake of simplicity, and will continue this practice below.) What follows is a slightly simplified version of Zucker's example of a
42
Normalization,
Cut-Elimination
and the Theory of Proofs
non-terminating proper reduction sequence:14 (2.1)
di d2 d4 ds Aj\- E Bj\- E dz Cn
(2.2) d* dh Cn'r F Dp\-F C V Dq\~F
di d3 d2 dl Aj\-E EuFmY-G BjhE Er,Fs^G Bj,Fs h G Aj.FmV-G A\JBk,Fm,FsY-G AV Bk,Fm V- G Cv DqiAvBkhG
(2.3) d\
dl
d4 CnVT
di DVVT
C V Dw\-F
di d2 AjhE Ei,Fm^G Al,Fm\-G
CvDq,AvBk
cww1cvDq,AvBk \-g
d2 d*z Bj*rE Er,F$hG B^Fs h G
,fs\-G
CvDq,AVBk\-G (2.2) is obtained from (2.1) by permuting the uppermost cut with V-left and (2.3) from (2.2) by permuting the bottom cut with contraction. Since the part of (2.3) written in calligraphic characters has the same form as (2.1), we can now repeat the procedure using it in place of (2.1), and so on ad infinitum. Zucker concludes from this example that any set of conversion rules for NJ which has the strong normalization property cannot correspond to the natural cut-elimination procedure. He does describe a more restricted set of rules for permuting cut with V- and 3-left. These preserve the correspondence between reductions in the two systems but, as he rightly points out, they are ad hoc. In fact, they scarcely deserve to be called permutative conversions because the transformations they specify are quite complicated and are defined by cases according to the principal connective of the cutformula involved. The exact significance of the results quoted above is a matter for debate. Zucker himself interprets them as demonstrating a failure of correspondence which "shows that there is indeed a combinatorial difference between sequent calculus and natural deduction, at least with regard to reduction procedures"; he even raises the possibility that "there may be (meaningful) properties of proofs which are preserved by all reductions in NJ but 14 Throughout this work the subscript on a logically complicated formula is associated with the formula as a whole, not with its rightmost component. So, for example, A V Bk should be read as (Av5) f e , not as AV(Bk), and similarly for the other logical particles.
C o m p a r i n g N J with L J
43
not in LJ." 15 On the other hand Pottinger, in his paper "Normalization as a Homomorphic Image of Cut-Elimination," 16 asserts that these same results provide a positive answer to the "question whether Cut-elimination procedures in L-systems are 'really the same thing as' normalization procedures in natural deduction systems." (In fact, Pottinger deals only with intuitionistic propositional logic, but his remarks apply equally well when quantifiers are included.) There are really two issues raised by Zucker's work: one is the failure of strong cut-elimination for LJ, and the other concerns the relationship between cut-elimination and normalization. As regards the first, I tried to indicate above how Zucker's counter-example depends upon a special feature of his indexing system. If his calculus is modified so that a formula occurrence need not be introduced with a new index, his construction can no longer be carried out. In fact, as Dragalin has shown, strong cut-elimination does hold for the full sequent calculus (classical as well as intuitionistic) relative to a 'natural' set of conversion rules. 17 Cuts can be permuted upwards past any other kind of inference without restriction, except that there should be no 'clash' of indices. To see what this means consider, for example, a figure of the form (2.4)
d!
d
A;WC
Meft
Y;AP\-B &,Af\Dr',Bq\-C cut T,A,AADr;Ap \~C
We cannot permute cut directly with A-left since this would result in the occurrence of Ap in the conclusion of d being incorporated into the premise of A-left. To avoid this, we pick some index m which occurs nowhere in d or d', and replace the latter by a derivation df* obtained from it by replacing all occurrences of Ap by Am. So the rule for permuting cut with A-left (in its right-hand premise) will be that (2.4) converts to (2.5)
d df* ;AphB A;Am;BqhC cut r , A; Am; Ap r- C , f T,A,AADr]Ap\-C
Such a restriction is quite natural if we bear in mind that conversions of this kind are intended to be simple permutations of inferences which should in all other respects leave the derivation unchanged. It is analogous to the requirement that, when a term is substituted for a variable in some expression, it be free for the variable in question. 15
Zucker, op. cit., Section 1.5.1. Annals of Mathematical Logic, Vol. 12, 1977, pp. 323-357. 17 See Appendix A below for further discussion of this issue and a sketch of his proof. 16
44
Normalization, Cut-Elimination and the Theory of Proofs
In view of what was said earlier about permuting cut with V- and 3left, we cannot expect a correspondence between this sort of cut-elimination procedure and the usual normalization one. In fact, when = is interpreted as the equivalence relation generated by this new set of permutative conversions, Theorem 2.6 fails in both directions for the derivations of LJ. Furthermore, the normalization procedure obtained by translating it via cj) into NJ is on the face of it neither particularly natural nor convenient. Nevertheless, the fact that strong cut-elimination holds relative to a natural set of conversions has some bearing on the second of the two issues mentioned above, that of the correspondence between cut-elimination and normalization. The customary normalization procedure for NJ is accepted uncritically in Zucker's work, and its image under <\> investigated. His belief that strong cut-elimination fails relative to the natural set of conversions for L J derivations is no doubt the reason why he does not consider whether there is a reasonable modification of this normalization procedure which corresponds to the cut-elimination one; it also leads him to his pessimistic conclusion about the possibility of a correspondence. There is, however, unquestionably an ad hoc character about the permutative reductions allowed in normalizing a derivation of NJ. Consider, for example, the following three pairs of figures: (2.6)
[5(a)]
n
A(x) 3zB(z) VxA(x) VxA(x)
My) (2.7)
n
A(x) VxA(x) 3zB(z) A(y) A(y) [B(a)}
[5(a)]
n
(2.8)
[B(a)\
n
A{x) 3zB{z) VxA{x) VxA{x) \/xA(x) V C
A(x) VxA(x) 3zB(z) VxA(x)VC VxA(x) V C
[B{a)\
\B(a)}
n
n
~ixA{x) A C 3zB(z) VxA(x) VxA(x)
3zB(z)
Vx^(x) A C Vx^l(x) A(y)
A(y) A(y) Anyone who knows the proof of the normalization theorem will recognize that the figure on the left in (2.6) must reduce to the one on the right, but it is almost impossible to imagine that there exists any general theo-
Comparing NJ with LJ
45
retical explanation of why this should be so in (2.6), but not in (2.7) or (2.8) (or in (2.6) and (2.8), but not in (2.7), if one accepts Martin-Lof s suggestion about normal forms18)—let alone one couched in terms of the properties of proofs represented by these derivations. It should not surprise us, therefore, that these features appear even more conspicuously ad hoc when they are translated into a cut-elimination procedure. We can define what it means for a formula occurring on the left of a sequent in an LJ derivation to be 'the major premise of an elimination', and allow a cut to be permuted with an application of V- or 3-left in the derivation of its lefthand premise only when the occurrence of the cut-formula in its right-hand premise satisfies this definition. Of course, the definition is very ad hoc, and the cut-elimination procedure which results can be tortuous (since it requires us to allow the permutative reductions to go from right to left as well as from left to right, i.e., to treat them as symmetric permutations rather than asymmetric reductions). But, reinterpreted in the appropriate way, Theorems 2.6 and 2.7 above will now hold for all of NJ and LJ. Conversely, if we start with the cut-elimination procedure for LJ, we can obtain a correspondence with a non-standard normalization procedure for NJ. The situations are not quite symmetrical, however, because the cutelimination procedure outlined above does seem natural, and lacks features which are obviously ad hoc. Unfortunately, this impression of naturalness derives in part from the fact that it is well-adapted to certain combinatorial peculiarities of the sequent calculus which NJ does not share. Faced with a situation of this kind, one can either suppose that there are significant differences between cut-elimination and normalization, and search for a more general framework within which these can be understood, or search for a framework within which such differences are explained away. It is my intention to pursue the latter course. The anomalies revealed by Zucker's work seem to lack intrinsic interest, and they pale beside the striking similarities between cut-elimination and normalization. I suggested above that the treatment of V and 3 in NJ is unsatisfactory even if the problems it creates for the correspondence are ignored. In the next chapter, therefore, I propose to take a closer look at natural deduction calculi to see how a revised treatment can be justified without reference to normalization. After that, I will consider what kind of correspondence with cut-elimination can be established for the revised calculus.
18 See pp. 253-4 of "Ideas and Results in Proof Theory" by D. Prawitz, in Proceedings of the Second Scandinavian Logic Symposium ed. by J.E. Fenstad, Amsterdam, 1971.
3
Natural Deduction Revisited Granted that Gentzen's N systems are natural—both in the sense that their derivations correspond quite closely to informal patterns of argument and in their treatment of the logical connectives, this does not preclude the possibility that other formulations of natural deduction are equally (if not more) so. These calculi do not correspond so closely to informal reasoning as to allow no room for variation. Furthermore, not every change in the formulation of their rules will affect the meanings of the connectives, and some may give rise to systems possessing certain formal advantages. Gentzen himself was the first to utilize this fact when he devised sequent calculi as a means of proving his Hauptsatz. Kreisel has remarked that "we can hardly hope that existing formalizations [of natural deduction] are exactly right for the new applications . . . After all, the systems were developed for other reasons, logical or aesthetic." 1 It is doubtful, for example, that Gentzen was interested in what properties of derivations (apart from the obvious one—that their conclusions remain more or less unchanged) are preserved by his reduction steps. Had he been, he would scarcely have introduced the mix rule as an auxiliary, justifying it only by a remark to the effect that in the presence of the other structural rules it is deductively equivalent to cut. (For we can show that any two derivations of the same end-sequent—and many more besides— are 'equivalent' if we combine Zucker's ideas about strong equivalence with the view that derivations which have the same translation in the calculus with mix are equivalent.) It is equally doubtful whether he was overly concerned with the relationship between combinatorial properties of derivations and structural properties of proofs, if indeed he thought in such terms at all. It is true that Gentzen prided himself on the similarity of his rules to the steps involved in actual reasoning, but this is a slightly different issue. On the few occasions he does consider the formal structure of derivations, he talks mostly about what he takes to be its artificial features. For example, 1
"A Survey of Proof Theory II," page 114.
46
Natural Deduction Revisited
47
he points out 2 that in a proof by cases (V-elimination) the tree form of derivations "does not bring out the fact that it is after the enunciation of [the disjunction].... that we distinguish the cases." As a matter of fact, NJ is not a particularly plausible candidate for a calculus the form of whose derivations accurately reflects the structure of the proofs they are supposed to represent. This last assertion is not based upon a particular conception of the nature of proofs, just on some general considerations about what constitutes an accurate representation. A derivation of NJ is a graph with a formula at each vertex, so it must represent an array of what these formulae stand for—propositions, formulae of some other language, judgments, mathematical constructions, or whatever—structured by some logically significant relation, which might be called 'immediately follows from'. The direction of the edge relation represents an ordering of logical rather than temporal priority (otherwise its transitive closure would surely have to be linear), and a fortiori indicates the dependence of each formula occurrence on its predecessors. Of course, this account of the matter is somewhat oversimplified if only because some connectives, notably V and —> , by virtue of their meaning require more information about dependence to be represented than the edge relation alone can convey. It is nonetheless roughly correct for derivations in the negative fragment. The elimination rules for V and 3 , however, do not fit comfortably into this pattern. The edges joining premises and conclusion in an application of either one of them do not exemplify the relation 'immediately follows from', nor does the ordering any longer reflect only logical priority. In the left-hand figure of (2.6) above, for example, the logical relationship between A(y) and 3zB(z) is the same as that between the lower occurrence of VxA(x) and 3zB(z). A kind of temporal order has now been introduced into the structure of derivations. (Significantly, these are the only two rules which may be permuted with the others in a derivation without destroying it.) In addition, both rules allow vacuous applications in a sense that the others do not; continuing the previous example, if no occurrences of B(a) are discharged by the inference, it is not clear that there is any significant logical connection between 3zB(z) and the remaining formulae occurring in the figure. The preceding remarks are not intended as criticisms of the elimination rules for V and 3 . One could well argue that rules of their form provide a more adequate means of formalizing the notion of proof. The point is rather that, if we take seriously the idea of a structural similarity between derivations and proofs, inferences need to be represented by formal rules in a uniform way, unless there is some compelling logical reason which justifies the differences. But it is difficult to argue that there is something about the meanings of V and 3 which explains the distinctive features of their 2
"Investigations into Logical Deduction," page 79.
48
Normalization, Cut-Elimination and the Theory of Proofs
elimination rules: why the order in which they are applied matters more than in the cases of the other connectives, for example, or why they allow vacuous applications. In consequence, it seems reasonable to conclude that there are combinatorial features of NJ derivations which have no analogue in the proofs they represent. It is also not my intention to challenge Gentzen's analysis of the logical particles. His analysis, however, does not determine uniquely what form the rules must take. We can distinguish three aspects of a rule which may vary while its logical content remains fixed: its formulation, its structural effect, and its manner of application. These terms are somewhat imprecise, but they are meant to be suggestive rather than scientific. The following examples should give an idea of what I have in mind: (1) As mentioned in Chapter 1 above, NJ can be formulated as a sequential system by writing the assumptions on which each formula occurrence depends to its left (followed by a turnstile). Then A-introduction, for example, becomes A-right, and A-elimination becomes the rule T\- AAB T\-A (or the same rule with conclusion T \- B). As Prawitz remarks, this is just a trivial reformulation of the system and is not to be confused with the calculus of sequents LJ. Equally, we could reformulate the rules of LJ after the pattern of natural deduction rules. (2) The schematic description of a rule does not by itself tell us what sort of structure results from applying it. It has often been pointed out, for example, that the rules of LJ can be interpreted as rules for constructing derivations of NJ, in addition to their intended interpretation as rules for constructing trees labelled by sequents. 3-elimination provides another example. Instead of interpreting it in the usual way, we could specify that the tree resulting from an application of this rule is obtained by placing a copy of the derivation of the major premise above each assumption discharged by the inference (in the derivation of the minor premise). (3) Even allowing for differences of kinds (1) and (2), it is clear that left rules and eliminations are not the same. If we reformulate A-left, for example, as a natural deduction rule, it becomes a sort of upward version of A-elimination. The difference is not a matter of meaning, but rather of how the rule is applied. Prawitz, taking his cue from the rules for V and 3 , has given a more general formulation of what constitutes an elimination. Using it, we can write Gentzen's rule for inferring from a conjunction as [A]
n
n'
AAB
C
c
Natural Deduction Revisited
49
Now, A-elimination is the special case of this rule in which II' is empty, and A-left the case in which II is empty. Differences of this kind may affect the deductive strength of a calculus, but it is misleading to think of one form of the rule as being stronger or weaker than the other. 3 When the rule for inferring from an existential formula is rewritten so as to agree in its formulation and manner of application with the rules for the other connectives (excluding V), the result is a rule of existential instantiation, i.e., something of the form 3xA(x)
~MbT Unfortunately, a rule of this kind has its drawbacks. Although it can be given a convenient formulation in classical logic by introducing a function from formulae to variables as an auxiliary syntactic device—a point I shall return to later—in the intuitionistic case awkward restrictions must be placed on some of the other rules ( V- and —•-introduction, at the very least) to ensure validity.4 Furthermore, even a deductively satisfactory formulation leads to problems when normalizability properties are taken into account. If the parameter b is supposed to depend only on the premise 3xA(x), the normalization theorem does not even hold, as the following example shows: 3
Later on, we shall encounter an example of a calculus which is incomplete, not because its rules are too weak, but because the conventions governing their application are inadequate. In the present case, the cut-elimination theorem ensures that no such problem can arise. Nevertheless, we have the impression that left rules are weaker than eliminations because not every NJ derivation corresponds to a derivation in LJ without cut. To see why this is misleading, it is helpful to think of left rules as eliminations applied upwards. Applying eliminations and introductions downwards, as in NJ, yields a larger class of derivations than applying eliminations upwards and introductions downwards, as in LJ, unless the latter calculus is supplemented by a cut rule. We can, however, also imagine introductions being applied upwards—although it then becomes more difficult to formulate the restrictions necessary to ensure their validity—and consider a calculus in which eliminations and introductions are both applied upwards; its derivations will just be those of NJ. Finally, if eliminations are applied downwards and introductions upwards, the resulting calculus is incomplete without the addition of a structural rule of some kind. (This last idea may seem far-fetched, but as a matter of fact tableaux proof procedures are of this form.) The above considerations suggest that, although the convention that all rules can be applied in the same direction is less restrictive than the alternatives, there is no sense in which an elimination is stronger than its corresponding left rule or vice versa. 4 See derivation (5.2) below for an example of why some restriction on —•-introduction is needed. The case of V-introduction is a little different. In both classical and intuitionistic logic, some device is needed to block the inference from 3xB(x) to VxB(x). There is a simple and elegant way to do so for classical logic, but it depends upon interpreting ->V as equivalent to 3-i and hence cannot serve for the intuitionistic case. (See Chapter 5 below.) The only other possibility seems to be the clumsy flagging restrictions on variables which are usually incorporated into systems of natural deduction containing a rule of existential instantiation.
50
Normalization, Cut-Elimination and the Theory of Proofs
3x(A(x) A B(x)) A{a) A B{a) A(a) 3xA(x) 3xA(x) A(b) A(b)
n
rr C(b)
(Here, and below, all the occurrences of b shown are supposed to be connected. 5 ) If we stipulate that b depends on the derivation of 3xA(x) as well, the following example shows that the normalization theorem will only hold if we allow reduction steps which simultaneously eliminate more than one maximal occurrence of a formula.
n
n
A(a) 3xA(x)
A(a) 3xA(x) Il2_
III
C(b) (This is so even if we allow interreducible derivations of 3xA(x) to determine the same parameter.) It seems, therefore, that we would be ill advised to replace 3-elimination by a rule of existential instantiation. Our aims will be better served by simply altering its structural effect in the manner suggested above. For the time being, we propose to exclude redundant applications of 3-elimination and restrict our attention to the fragment of NJ without 5 Two occurrences of a term t in a derivation are said to be connected if there exists a ^-connection between the formulae in which they occur. A ^-connection is a connection each of whose members contains t, and a connection is a sequence ai,...,an of formula occurrences such that for a l i i (1 < n) one of the following conditions hold:
(1) (2) (3) (4)
cti and a^+iare both premises of an application of —•-elimination. oti lies immediately below OL% + I or vice versa. a.i and a i + 1 are both discharged by the same inference. oti is discharged by an application of —^-introduction whose conclusion is a i + i , or vice versa. When we are dealing with NJ or fragments thereof (as opposed to a calculus of the sort considered in the text), this definition must be modified so as to conform to Zucker's. (See Def. 2.5.1 on page 34 of his paper.) In effect this means qualifying clause (2) with the proviso that ct{ not be the major premise of an application of V- or B-elimination, and adding the clause: c*i is the major premise of an application of V- or 3-elimination and ct{+i is discharged by the inference, or vice versa.
N a t u r a l D e d u c t i o n Revisited
disjunction (Nj(
v
^). Consider a calculus which is like Nj(
ni
v
51
) except that
[Mb)] n2
3xA{x)
C C instead of being the tree obtained from IIi and n 2 by adding a new vertex labelled C below their bottom vertices, is obtained by placing a copy of III above each assumption discharged by the inference and connecting each conclusion 3xA(x) with the corresponding occurrence of the assumption A(b). (Call this calculus JVJ<-V>'.) As I remarked earlier, it is a feature of 3-elimination that an application can be permuted with other inferences without destroying the derivation in which it occurs. The only restrictions are: (1) that it cannot be permuted downwards past an inference which discharges an assumption in the derivation of its major premise, and (2) that it cannot be permuted upwards past an inference whose premise(es) contain the proper parameter of the elimination. To minimize the inconvenience of (2) and to ensure that the proper parameter property, 6 for example, is preserved by such permutations, let us adopt the convention that derivations which result from one another by relettering all the connected occurrences of a parameter (subject, of course, to the obvious restrictions on such substitutions, and provided that the open assumptions and conclusion of the derivation remain unchanged) are to be identified. This means, for example, that the following two derivations of NJ^~^ can be obtained from each other by permutation: [A(b)\
[A(b)\
[A(b)\
ni
n2(6)
rix
C
D
3xA{x)
3xA(x) _C
CAD
C
[A(c)}
n2(c) 3xA(x)
D D_
CAD
CAD
where c is a parameter which occurs nowhere in 112(b). Another example of such a pair is:
[Mb)} ni C{b) 3xA(x) 3yA(y) 3yC(y) 6
1Mb)] n; 3xA(x)
C(d)
C(d) 3yC(y)
This is explained for both natural deduction and sequent derivations in Zucker's Definition 2.5.1 under the name of the proper variable property. For the former it means that, if a occurs as the proper parameter of an inference, then all the occurrences of a in the derivation are connected to each other. For the latter, it just means that every occurrence of the proper parameter of an inference in a derivation occurs above that inference.
52
Normalization, Cut-Elimination and the Theory of Proofs
where d is a parameter occurring nowhere in IIi, and Ili is obtained from III by substituting d for all occurrences of b connected to the one in the conclusion C(b) (provided that the two occurrences of b shown on the left are not connected). I shall not write out anymore cases since they are pretty obvious. The only point to bear in mind is that we may assume parameters are relettered in an appropriate way whenever it is necessary or expedient to do so. Let ^ be the equivalence relation on the derivations of ATj(~v) generated by this set of permutations. There is an obvious correlation between such derivations and those of Nj(~v^', which can conveniently be represented as a map ip : Nj(~v^ —• Nj(~y^f. Furthermore, it is not hard to see that for a i m , IT in NJ^~^
n ~ n ' iff ^(n) = ^(n/). This can be proved by the same technique as Zucker uses to prove the result cited as Theorem 2.6 in Chapter 2 above. 7 From left to right it is just a matter of checking all the cases, and from right to left the proof is by induction on the length of ^(U), the various cases being determined by the last rules of II and IT. (The only non-trivial case is when at least one of these is 3-elimination.) Composing ip with the mapping
n A{c) 3xA(x) A{b)
yx
n A(c) II"
ir where II" results from II' by substituting c for all occurrences of b connected 7 8
See pp. 51—65 of his paper. These reduction steps are listed as cases (Bib), (Blh) and (B2d) in Appendix A below.
N a t u r a l D e d u c t i o n Revisited
53
to the one shown. The drawback to this procedure is that it takes us outside the class of NJ^~v^f derivations. To implement it we would need to introduce a wider class of figures (obtained from the derivations by adding the rule
n 3xA(x) for all parameters (b) and show that every derivation reduced to a normal one despite possible detours via these quasi-deri vat ions. An alternative procedure would be to define reduction steps in terms of how Nj(~v^' derivations are constructed—in effect, this means in terms of Nj(~y) derivations. So, we would have for all cr, r in NJ^~V^ ayxr
iff II >-i n ;
for some II, IT such that i/j(Ti) = a and ^(IT) = r. Finally, there is an intermediate procedure which requires that maximal formulae-occurrences be removed one by one, except in the case of existential formulae all of whose a-connected occurrences can be removed at the same time (where a is supposed to be the proper parameter of the elimination). How close a correspondence obtains between normalization in Nj(~v^' and in the other calculi depends, of course, upon the particular normalization procedure chosen. But for all those sketched above we have at least the following: (1) For all 11,11' in 7Vj(~v), n yx II' implies ^(11) y >(II').9 (2) For all a, r in Nj(~v^, a ^ T implies that there exist II, II' such that
a = >(n), r y ip{W) and n yx n 7 . 10
(3) For all dj in L J ( " V \ d yx d' implies 0 o tp(d) y
54
Normalization,
Cut-Elimination
and the Theory
of Proofs
whereas, going in the opposite direction, a single step may need to be combined with some others before it corresponds to one step (as in (2) and (4)). Furthermore, the correspondence is close enough to allow results to be translated from one calculus to another. In particular, strong normalization and uniqueness of normal forms in Nj(~w^f translate via 0 and ip into the corresponding results for Nj(~v^ and LJ^~y\ Although normal forms may not be unique in NJ^~V^ or LJ^~y\ they will be equivalent under ~ and = , respectively. As for strong normalization, it can be translated into the assertion that every non-repeating reduction sequence terminates. Because equivalence classes of derivations under ~ are finite, permutations can be included among the reduction steps for NJ^~V\ In the case of LJ^~y\ however, equivalence classes under = are not finite, as Zucker already observed for the negative fragment. Nevertheless, by utilizing the asymmetry of the reduction steps, he was able to show that any sequence of reductions in LJ~ must either terminate or have infinite (The translat ion will also go repetitions and, the same holds for Lj(~vln the other way, if we adopt the second of our normalization procedures for ATj(-v)', for it allows y to be replaced by = in (2) and (4). 12 ) Although it is possible to define the class of Nj(~y>}' derivations directly, using a rule of existential specification, and to normalize them by removing one maximal formula occurrence at a time, such an approach is neither as natural nor as convenient as one would wish. Nevertheless, Nj(~y^f does provide what might be called a permutation-free representation of proofs (albeit for a restricted set of connectives) and a reasonable framework within which to compare normalization and cut-elimination procedures. To some extent, therefore, it vindicates my earlier remarks about the desirability of a revised treatment of V- and 3-elimination. Unfortunately, when it comes to full predicate logic, there is no satisfactory analogue of Nj(~v>}'. The problems encountered in attempting to extend the above treatment to disjunction form the subject of the next chapter.
11 A detailed account of how these results can be translated between the negative fragments of NJ and LJ is to be found in Sections 5 and 6 of his paper. 12 This will be so, however, only if Zucker's indexing conventions and contraction conversions are adopted as well. It does not hold for the versions of the sequent calculus presented in the appendices. This is because (4), even with y replaced by = , does not guarantee that reduction sequences in Nj(~v^f can be translated into reduction sequences in Lj(~w\ If we were to add the clause: Furthermore, for all d" such that d" =
4
The Problem of Substitution When the rule for inferring from a disjunction is rewritten so as to agree in formulation and manner of application with the elimination rules for —>, V and A, the result is something of the form Ay B A B There is an obvious non-standard feature of this rule, which some might argue disqualifies it from being regarded as one at all, namely, it lacks a unique conclusion. The idea of rules with more than one conclusion is, however, a coherent one and there is no reason to eschew it in principle.1 Indeed, something like it is embodied in the sequent calculus (in its classical version at least) for sequential systems are most naturally interpreted as defining a derivability relation between sets of formulae. (Although disavowed by Gentzen, such an interpretation clearly underlies his development of these calculi.) Formally speaking, a sequent rule has only one conclusion, an array of formulae—not that the same could not be said of the rule cited above—but if the difference is just a matter of form, there is little to be said against derivations with multiple conclusions. Ordinary practice certainly does not rule them out. An incomplete proof by cases, for example, can reasonably be regarded as an argument with more than one conclusion (and a completed one may contain more than one occurrence of its conclusion). Even if we believe that a constructive proof, by its very nature, can only establish a single conclusion, there is no reason why its formal representation should not be allowed to contain multiple occurrences of that conclusion. As a matter of fact, the method of representing proofs in natural deduction runs counter to ordinary practice in this respect. In an NJ derivation an assumption is written down each time it is used, while the conclusion is a unique formula occurrence. When arguing informally, x It has been treated at length by Shoesmith and Smiley in their book MultipleConclusion Logic (Cambridge, 1978), and more recently by Girard. See his discussion of proof-nets in Section 2 of "Linear Logic," Theoretical Computer Science, Vol. 50, 1987, pp. 1-102.
55
56
Normalization,
Cut-Elimination
and the Theory of Proofs
on the other hand, it is rare to copy down assumptions more than once and not unknown for the conclusion to be written out several times. The revised rule for V-elimination given above is clearly valid in both the intuitionistic and the classical sense, provided that its conclusions are interpreted disjunctively. Classically, this means that at least one of them must be true; from a constructive point of view, it is perhaps better to think of a derivation with more than one conclusion as an unfinished argument which is completed by showing that a particular formula follows from each of its conclusions. When combined with the other rules, however, the presence of multiple conclusions requires us to place some additional restrictions on V- and —•-introduction in the intuitionistic case. These should have the effect of blocking inferences from Vx(A(x) V B) to VxA(x) V B and from A —> (B V C) to (A —> B) V C. Such restrictions can be avoided by adopting the expedient used earlier in the case of 3-elimination, namely, leaving Gentzen's formulation of the rule intact while altering its structural effect. This would not get rid of multiple conclusions, but it would ensure that each conclusion in a derivation was an instance of the same formula. Furthermore, even if the direct rule given above is retained, Gentzen's Velimination will have to be translated into a (derived) rule of the resulting multiple-conclusion calculus (call it NJ1) when the need arises to map the derivations of NJ into those of NJ'. For these reasons I shall turn now to the question of how such a translation is to be accomplished. The basic idea is straightforward enough. I want to interpret
n
AMB
[A]
[B]
iii
n2
c
c
c
as the figure obtained by simultaneously substituting the conclusions A of
n AM B A B for the assumptions A of A rii C and the conclusions B of this same derivation for the assumptions B of B
n2 c It is, however, by no means obvious what substitution is to mean in this context. The derivations of NJ are just trees and we know exactly what it means to substitute a derivation II of C, say, for each occurrence of C
The Problem of Substitution
57
as an open assumption in some other derivation II'. The derivations of NJ', on the other hand, will be directed graphs which may have more than one bottom vertex labelled by the same formula. Our aim is to define an operation on these figures which shares the basic properties of substitution for trees, and coincides with it in case the derivation being substituted is a tree. For example, assuming a has conclusions of the form A, we expect the result of substituting these for the assumptions A of r, written (<J/A)T, to satisfy the following conditions: iA i\
4 1}
<-
(4.2)
/ i A\n
i
a
(*M)c = { c ( * M ) £ R =
if C is A
otherwise ^
^
R
when R is a one premise rule, and (4.3)
{a/A)^W
=
WWEWA)DRI
when R/ is a two premise rule. In view of this, it appears that we must first decide what kind of figure results from applying a rule of inference to a graph derivation. Again, this is unproblematic in the case of tree derivations, but less so when we consider graphs—especially where two premise rules are concerned. Having made such a decision, we should then be able to define substitution, using the above conditions together with the requirement that it be an associative operation. If proofs are regarded as functions, substitution is analogous to the operation of composition. For this reason alone, it ought to be associative. In the present context, however, even the property of associativity needs to be generalized. What I have in mind can be illustrated by considering the properties of composition applied to functions of more than one argument. Suppose that f: Xi x X2 *-+ X, g: Yi xY2 ^ Y, h: X xY \-> Z and i: Vi x V2 •-»• X\. Thenfto/,for example, is a function from (Xi x X2) x Y to Z a n d , just as (hof)oi = ho(foi), so (hof)og: (X\ xX 2 )x(Y"i xY2) »-» Z must be equal to {ho g) o / . Because the derivations under consideration here correspond not only to functions of more than one argument, but also to ones which yield an array of values, substitution should satisfy not only conditions of this sort but also their duals. To illustrate what I mean by the dual oi(hof)og = (hog) of (subject to the obvious restrictions on the domains and ranges of / , g and h) consider the derivations II, II! and II2 on the preceding page, and let a be the figure which results from applying the rule AW B A B to II, then it should be the case that (4.4)
((a/A^/B)^
= (((T/B)n 2 M)n 1
58
Normalization, Cut-Elimination and the Theory of Proofs
provided that B is not among the conclusions of IIi nor A among those of
n2. Recall that the assumptions in an NJ derivation are supposed to be grouped into equivalence classes, and that the application of a rule of inference discharges all the members of at most one such class (or a pair of such classes, in the case of V-elimination) rather than all occurrences of a given formula as an assumption. Rules can therefore be described as applying to equivalence classes of formula occurrences, rather than to individual occurrences or to all occurrences of the appropriate form. Since each NJ derivation has a unique formula occurrence as its conclusion, these distinctions become blurred in the case of the direct rules. In a multiple-conclusion calculus, however, this is not so, and it is convenient to group conclusions into equivalence classes as well while retaining the convention that rules are applied to such classes. The class to which a formula occurrence belongs will be indicated by its subscript, a natural number. Whether these subscripts are formally a part of the calculus or merely an auxiliary device will be left open. In the latter case any renumbering of subscripts which does not affect the composition of the various equivalence classes leaves a derivation literally unchanged; in the former, any such renumbering is said to produce a congruent derivation and, for (almost) all practical purposes, congruent derivations can be treated as identical. Pursuing the previous example in the light of these considerations, it is always possible to ensure that ((a/A^/B)^
= ((<7/B)n 2 /4)IIi
or, more properly, that
((*M„)ni/B m )n 2 = ((a/B m )n 2 Mjn 1 by carrying out some appropriate resubscripting when necessary. Furthermore, this can be done in such a way that the subscripts on the assumptions and conclusions of the resulting derivation remain unchanged. Let (^fc/n)n denote the result of replacing all the assumptions An of II by ones of the form Ak\ similarly, Yl{Ak/n) is the result of replacing all conclusions An by ones of the form Ak- The meaning of simultaneous in the phrase 'simultaneous substitution' can now be explained. Given a figure a which includes An and Bm amongst its conclusions, the result of simultaneously substituting the conclusions An of a for the assumptions An of IIi and the conclusions Bm of a for the assumptions Bm of II 2 , is defined as a a ( B f c / m ) M n ) n i / B f c ) ( B f c / m ) n 2 or ( ( a ( ^ / n ) / 5 m ) n 2 M P ) ( ^ / n ) n 1 for suitable choice of k and p. 2 The point of all this is that the order in which these consecutive substitutions are performed does not affect the result; they can therefore be treated as having been carried out simultaneously. A 2 k and p are suitably chosen if they avoid any clashes of subscripts. It is sufficient for this purpose to require that A; occur nowhere in 111 and p occur nowhere in Eb-
The Problem of Substitution
59
more suggestive notation for the figure denoted by the above expressions is
n AW Bt
ni n2 and I shall adopt it, modified according to context, for simultaneous substitutions in general—at least until the end of this chapter. 3 Condition (4.4) above is just one of a group which together constitute a generalization of associativity in the sense that they characterize the basic properties of substitution for multiple-conclusion derivations (or composition for functions of more than one argument and value). It is easy enough to supply the others but, with one exception which is of particular relevance to the discussion, I shall not do so here. Using T,T',. .. etc., to range over possible multiple-conclusion derivations, the condition in question can be written as follows: If Cp occurs among the conclusions of r and r', and An is not an assumption or conclusion of r", then (4.5)
((T/An)r'/Cp)r"
=
/An)(r'
((T/CP)T"
/Cp)r"
The above is of interest because, when r is (a/Bm)Il2, a figure of the form
r' is III and r" is
C
P
R
D
or
5P
Er
,
D Uq
it asserts that
n
n
AVBk
AVBk An
t>m
An
£>m
nx
n2
ni
n2
Op
Cp
Dq
R
DqK
DqK
or the analogue of this equation with R/ in place of R (provided that substitution is assumed to be associative). Read as a reduction from left to right rather than as an equality, the above is just a notational variant of what Prawitz calls VE-reduction (on the understanding that II, IIi and II2 are NJ derivations and at least one occurrence of Cp is maximal). The situation is reminiscent of Nj(~v^. Once the decision was made to interpret 3-elimination in terms of substitution, the basic properties of this operation ensured that permutative 3-reductions became redundant. So, here, S u b s t i t u t i o n as defined in Chapter 5 does not satisfy (4.4). As a result, a slight reinterpretation of the notation introduced here will become necessary.
60
Normalization, Cut-Elimination and the Theory of Proofs
the properties of a more general notion of substitution rule out permutative V-reduetions. What I have done so far is to adumbrate a kind of multiple-conclusion variant of natural deduction. I have not, it is true, stated its rules— although it is not hard to imagine what form they ought to take—but have simply discussed some of its general properties. More significantly, I have not yet described the structural effect of applying a rule in this calculus. It is the latter that concerns me most here but, in order to have a fixed framework within which to consider it, I will begin by listing a set of propositional rules: (In what follows ra, n , p , . . . range over natural numbers.) Axioms: Am Rules: (1)
s*-m
(2) a.
tjn
AABp (3) a.
(5)
(7)
Am Ay Bp
b.
Bm A-*BP
A A Bp
b.
AABP
Am
Bn Ay Bp
Ay Bp
(4)
An
(6) {An}
±P_
Bn
A.n
Dm A.
• tip
Bm (8)
An
An
TP
Some comments on the rules: (1) Rule (8) is clearly redundant and is included only for aesthetic reasons. Having introduced it to preserve symmetry, I propose to ignore it in the sequel because its presence complicates the discussion of normalization. (2) These rules are supposed to operate on multiple-conclusion derivations. This means, in particular, that they apply not only to derivations of their premises, but also to derivations whose conclusions include their premises. For example, rule (1) is to be interpreted as follows: if r is a derivation of A, Am from T and r' is a derivation of Af,Bn f r o m r , then r r' AABp is a derivation of A, A', AABV from T, V (where T, A , . . . range over sets of subscripted formulae). (3) Rule (5) is just —•-introduction; the notation {An} indicates that all occurrences of An as an assumption are discharged by the inference.
The Problem of Substitution
61
There is one small deviation from the usual formulation of the rule, however: it is convenient to allow an application to discharge any other occurrence of An as an assumption which may subsequently be incorporated into the derivation, provided that there is a path between it and some occurrence of Bm (as a premise of the inference) which does not pass through any occurrence of A —• Bp (as a conclusion of the inference). In view of what was said earlier about the properties of substitution, permutations of inference—where possible— cannot affect the structure of a derivation. As a result, the class of derivations is not enlarged by this feature of the rule. All it allows is more latitude in the matter of permuting inferences. (4) Rule (5) is the only rule which is not intuitionistically valid (because it is tantamount to inferring (B —• C) V D from B —• (C V D) ) . It could be made so by imposing suitable restrictions, but for the present it suffices that all the rules are classically valid—a fact which is easily verified. A set of rules 1Z determines both a consequence relation between (sets of) formulae and a class of figures called derivations. In the former case, rule (3a), for example, would be interpreted as if A, A is a consequence of T, then A, A V B is a consequence
of r and, in the latter, as the result of applying this rule to a derivation of A, A from T is a derivation of A, A V B from T. The consequence relation or class of derivations determined by 1Z is the least such relation or class closed under the rules of 1Z, when appropriately interpreted. These two notions are connected by the derivability relation (where A, A is said to be derivable from T if there is a derivation of A, A from T) because it is usually taken for granted that, for any 1Z, derivability by 1Z coincides with consequence by 1Z. It is, however, slightly misleading to speak of 1Z determining a class of derivations. In fact, there may be many classes of figures which qualify. To fix the class of 1Z derivations, it is necessary to specify the kind of object which the rules of 1Z are supposed to construct (whether, for example,they are to be sequences, trees or graphs, and how their nodes are to be labelled). The rules of % can then be interpreted as operations on these objects, and the 1Z derivations will be the smallest class closed under the operations. (The rules of LJ, for example, as remarked earlier, admit more than one such interpretation.) In other words, it is necessary to specify a notion of derivation. It can happen, however, that the notion in question is defective: it may not be closed under all the requisite operations, for example, or it may be closed under them, but their application may not always yield the desired result (i.e., a
62
Normalization, Cut-Elimination and the Theory of Proofs
derivation of the required conclusions from the assumptions given). Such deficiencies may even carry over to the derivability relation so that it no longer coincides with consequence. I propose to call a notion of derivation adequate for % if the relation of derivability by K which it determines coincides with the consequence relation for 1Z.4 It should be obvious, especially to anyone familiar with Gentzen's calculus LK, that (l)-(7) represent a complete set of rules for the classical propositional calculus in the sense that (I\ A) is in the consequence relation determined by (l)-(7) iff /AT —• W A is a tautology. It still remains to specify a notion of derivation which is adequate for l)-7) and satisfies the properties described earlier in connection with the substitution operation. Derivations, according to what was said above, will be graphs whose vertices are labelled by formulae (together with some additional information, perhaps). Not every such graph will be a derivation, however, even if it is built up exclusively from configurations like A B \ / AAB
B 1 AWB
etc.
which exemplify instances of the rules. To exclude figures like Ay B A B AAB for example, we stipulate that the premises of rules (1) and (6) cannot both be conclusions of a single derivation. Following the usual practice for NJ, each axiom of the form Am will be represented by a one-element graph whose only vertex is labelled Am. So, the problem is to generate a class of derivations from these one-element graphs by means of operations which represent the rules of inference—in other words, to describe the structural effect of applying a rule. Unfortunately, there is no way to do this without abandoning at least one of the requirements discussed above.5 4 These ideas are drawn from Shoesmith and Smiley. See, in particular, page 26 of their Multiple-Conclusion Logic. They use "deduction" and "deducibility" where I have used "derivation" and "derivability." Aside from this, their explanation of adequacy differs from the above in only one respect: I have defined adequacy relative to a set of rules, whereas they call a notion of deduction adequate if it is adequate for every set of rules. The coincidence of derivability and consequence for a set 71 of rules is of course not the same as the completeness of 7£. In principle, the issue of adequacy arises in the case of both single- and multipleconclusion rules, but it assumes no practical importance in the former case. This is because there are few alternatives to consider when defining deduction for single-conclusion calculi, and the only two taken seriously—deductions as sequences of formulae and deductions as trees—are both adequate in this sense for any set of rules. In the multipleconclusion case, however, matters are less straightforward, as we shall see below. 5 1 have attempted to place as few restrictions as possible on what kinds of figures
T h e P r o b l e m of S u b s t i t u t i o n
Theorem 4.1 Any notion of derivation which is adequate for rules cannot have an associative substitution operation.
63
(l)-(7)
The remainder of the chapter is devoted to proving this claim and discussing some of its consequences. Assume the rules of inference have been represented as certain operations on graphs and that substitution has been defined in terms of these in the manner suggested above. It then makes sense to talk about the class N of derivations generated by the rules, and to consider the subclass D of N generated by the propositional rules of NJ. (Alternatively, D could be described as the range of a mapping from propositional NJ derivations to N. It is preferable, however, to think of the rules of NJ as being reinterpreted so that they now yield members of TV.) The schematic descriptions of these rules coincide with those given above, except for -»- and V-elimination. In the case of the former, the difference is just a matter of notation. As for the latter, [A]
[B]
n
iii
n2
AvB
C
C
C is to be interpreted as
n AWB A B
iii n 2 Redundant applications of the rule, i.e., ones which do not discharge an assumption in both of the minor premises, are excluded as they were earlier in the case of 3-elimination. The problems these present will be discussed later. Lemma 4.2 Suppose that a is a particular occurrence of the configuration A\JB A B derivations could be. To call them labelled graphs is just a way of saying that they are structured arrays of formulae. As for the choice of one-element graphs to represent axioms, it is made for the sake of simplicity and definiteness. Nothing depends upon it and almost any other objects could serve as well. The argument which follows makes few additional assumptions about derivations; only that their structure is logically significant and that the rules of inference operate in a uniform way. It does, of course, depend upon the choice of rules—although not on these particular formulations of them—and, more importantly, on the stipulation that the two premises of an inference involving (1) or (6) not be connected prior to the application of the rule. These assumptions and the possibility of avoiding the result by doing without them are discussed briefly at the end of the present chapter.
64
Normalization,
Cut-Elimination
and the Theory
of Proofs
in a derivation II of T). Then U can be written as W AVB A B
ni n 2 for some n',IIi,Il2 in D, where a is introduced by the application of Velimination shown. Proof. By induction on the definition of D (i.e., on the rules of NJ). We think of II as being given together with a construction tree which establishes its membership in D, and carry out the induction on the number of steps in this tree. The basis step is trivial, and the induction step is taken care of by condition (4.5) above (which says, in effect, that V-elimination can be permuted downwards past any inference) except when the last step in the construction of II is another application of V-elimination. In this case II is of the form
nx CVD C D
nr n* and there are two subcases to consider: 1. a is part of II X . By induction hypothesis II x can be written as
n* AW B A B
ni rr2 There are a number of possibilities to consider according to whether C V D is among the conclusions of II*, II^ or n 2 . Suppose C V D occurs as a conclusion of all three; the other possibilities are similar and easier to handle. So, II x has the form IT Ay B
CVD
A
B
ni
n'2
CVD
CVD
I have not bothered to write in the subscripts and will simply assume that the only occurrences of C V D, A V B,C, D,A and B as assumptions or conclusions in I I x , I I * . . . etc., are the ones shown.6 Now 6 Obviously, this effect can be achieved with subscripted formulae by resubscripting where necessary, so there is no loss of generality.
The Problem of Substitution
65
(4.6)
n=
nx/D nx
_cyD/c
where (4.7)
nx CVD = C D
By applying (4.5) twice to the right hand side of (4.7) and because of the relationship between substitution and the application of a rule expressed by (4.1)-(4.3) above, we obtain (4-8)
n
*
CVD = ((n"M)n'1'/B)n'2' C D where
n* n" =
A\/B A B
C\JD C D
ni ny
= CMD C D
n2'
=
n2 So,
CVD C D
n = ((((n"M)n;7B)nZ/c)nr /i>)n2x.
(4.9)
Applying (4.5) twice to the expression within the outermost brackets yields
(4.io)
n = ((((n"/c)n 1 x /A)(n;7c)nr/B)(n^7c)nr/D)n 2 x
and, using (4.5) again to distribute Il 2 , we get (4.11) 11 =
((((n'7c)nr/D)n x M)((n' 1 7c)n 1 x /D)n x /s)((n 2 7c)n 1 x / J D)n x But, writing
n' AM B A B
66
Normalization, Cut-Elimination and the Theory of Proofs
for ((n"/C)n i x /Z))n 2 x and writing Ifc for ( ( n ^ 7 C ) n x / D ) n 2 x , where i = 1 or 2, (4.11) becomes
n' n=
AVB
A
IIi/£
n2
= AVB A B
ni n2 2. a is part of II* or n j . We might as well assume cr is part of II x (If it is part of IIJ, the argument is the same.) Here the various subcases (according to whether C occurs as an assumption in II*, n^ or Uf2) do not require separate treatment because it follows from (4.1)-(4.3) that, when A is not among the assumptions of <7, {T/A)<J = a. So, by induction hypothesis, II* can be written as
n* AVB A B
ni n'2 The remarks made above about the absence of subscripts, and the occurrence of A V B, C V D, etc., as assumptions or conclusions apply here as well, except that C may occur as an assumption in any or all of II*, II^ and II2 although I will not display it. Let r be
then (4.12)
n = (T/C) ((
AVB IA A Bl
) n;
/B
\ n'2
and we want to show that this is the same as (T/C)AVB/A\
(T/OWJB
I (T/C)W2
For this purpose I appeal to the dual of (4.5). Duality in this context means switching assumptions with conclusions, and the expression being substituted with the one being substituted into. The condition in question is (4.13)
(T/A)(T'/B)T"
=
((T/A)T'/B)(T/A)T"
provided that B is not among the assumptions or conclusions of r . Two
The Problem of Substitution
67
applications of (4.13) to (4.12) yield the desired result. Hence IT A
B
ni n 2 where
IT II' = (r/C)A V B A B
and n^ = (r/c)n; (t = 1,2).
•
If v is a vertex of a derivation II which is labelled Cn, v is called the occurrence v of C n in II. Definition 4.1 Suppose there is a configuration a of the form AVB A B in II and that v is a vertex ofU. v is said to join the disjunction Ay B in a if there are paths from v to both the occurrence of A and the occurrence of B in a which do not pass through the disjunction. Although I have not yet specified what operation on graphs corresponds to applying a rule of inference, the following holds at least. Lemma 4.3 The application of a rule of inference to a derivation II cannot introduce formula-occurrences which join any disjunction in II. Proof Again the proof is by induction on the definition of D, or rather on the definition of the class D ' which is like D except that no restriction is placed upon the conclusions of the minor premises in an application of V-elimination (i.e., they need not be the same as one another). Although Lemma 4.2 was stated only for D, the proof applies equally to D', so I feel entitled to claim this result for D ' as well. (Notice, incidentally, that D ' is just N.) The basis step is trivial. Now suppose II contains no joined disjunctions and that the application of a one-premise rule R, one of the form C_
a say, introduces a vertex v which joins some occurrence of the disjunction A V B in the resulting derivation
n C"
68
Normalization, Cut-Elimination and the Theory of Proofs
Then, by Lemma 4.2,
n' AM B
n=
A
B
iii
n2
c c
where the disjunction which becomes joined in
n c_ c is introduced by the application of V-elimination shown. Hence, by (4.5)
C = •=, ~
n'
n'
AM B A B ih n2
AM B A B nx n2
c_ c_ c
=
~
c_ c_ c c
But rules operate only on their premises, not on any other vertices of a derivation, so it follows from the definition of substitution that there can be no connection between the vertices of Iii C_
C and those of
n2 c_ c except via an occurrence of A V B—which means that no vertex in either of them can join such an occurrence. The case in which R is a two-premise rule, given our earlier stipulation that its premises must belong to separate derivations, is just a notational variant of the above. It only remains to argue that no disjunctions are joined as a result of applying V-elimination. By the definition of simultaneous substitution, it suffices to show that, if II and II7 contain no joined disjunctions, then neither does (U/A)IV. But this follows from the preceding cases by an induction on the construction of IT and the associativity of substitution.
• If rules are thought of as operating on individual vertices, rather than on groups of them with the same label, there is an obvious interpretation of applying a rule of inference, namely, the usual operation on tree derivations extended to the case of directed graphs. Suppose now that the formulae
T h e P r o b l e m of S u b s t i t u t i o n
69
appearing as premises in the set of rules given above are taken to be individual occurrences and that this interpretation is adopted, then the class of derivations generated in this way will be denoted by N'. Lemma 4.4 Let N be any class of derivations generated be rules (l)-(7) above whose members satisfy Lemma 4-3- Then, to each derivation II in N there corresponds Iic in N' such that II and Yic have the same assumptions and conclusions. Proof. Because each formula occurrence in a graph-derivation is supposed to follow from its immediate predecessors (or, when a vertex has two successors, to follow from it jointly in some sense), the application of a rule can only consist of performing the following operations (perhaps more than once): (1)
a. Add a new vertex labelled by the conclusion of rule (2), (3), (5) or (7) below a bottom vertex labelled by its premise. b. Add a new vertex labelled by the conclusion of rule (1) or (6) below a pair of bottom vertices each labelled by one of its premises. c. Add two new vertices, each labelled by one of the conclusions of rule (4), below a bottom vertex labelled by its premise. (2) Amalgamate a pair of vertices which are both instances of the same conclusion. Vertices in a derivation are said to belong to the same cluster if they were introduced by the same inference—except in the case of rule (4), where the vertices introduced are divided into two clusters depending upon whether they are labelled by the left or right conclusion of the inference.7 It follows from Lemma 4.3 that, when a pair of vertices is operated upon—as in (lb) and (2), the members of the pair must belong either to the same cluster or to separate derivations. In view of this it is a routine matter to show by induction on the rules that, if II is any derivation built up using (1) and (2), we can find II C as described above with the additional property that Uc has the same number of occurrences of each of its conclusions as II has clusters of them. (This stronger condition is needed for the induction step in the case of the two-premise rules.) • I claim now that derivability in N' is not the same as the consequence relation defined by the rules. Furthermore, there is no hope of salvaging the situation for intuitionistic logic by finding a suitably restricted version of the rules and showing that the corresponding subset of N' coincides with their consequence relation. In other words, there is at least one intuitionistically valid formula of the propositional calculus which cannot be derived in AT'. 7 I assume some way to differentiate between these two groups of vertices even if both conclusions happen to be the same formula (with the same index).
70
Normalization, Cut-Elimination and the Theory of Proofs
The argument I propose to give is based upon a similar one to be found in Chapter 8 of Shoesmith and Smiley.8 They set out to show the inadequacy—in the sense explained in footnote 4 above—of Kneale's notion of development, using a slightly different set of rules. As they point out, "Kneale's 'tables of development' are the pioneer multiple-conclusion proofs." 9 A table of development is essentially the same as a derivation of N' except that rule (5) is replaced by A
* A->B
B A-+B
where the rule on the left is one with no premises. I have preferred rule (5) out of deference to tradition and, more importantly, because it is based on the intuitionistic meaning of implication. Their rules on the other hand reflect the truth-table definition of this connective. (Of course, this is not to say that their rules cannot be restricted—or that ours need not be—to become intuitionistically valid.) The normal-form theorem holds trivially for this calculus, as does the strong normalization theorem, because the carrying out of any reduction step actually diminishes the size of a derivation (as measured by the number of its vertices). Smiley and Shoesmith therefore need only argue that there are tautologies which have no normal derivation to establish their result. The example which they provide, namely (A —* A) A (A V (A —• A)), is easily seen to be derivable in Nf, so I have been obliged to come up with a more complicated one. Nevertheless, it is misleading to describe one set of rules as being stronger than the other—after all, both can fairly claim to be complete formalizations of the classical propositional calculus. It is the way in which derivations are pieced together, rather than individual peculiarities of the rules themselves, which accounts for the existence of such examples. Lemma 4.5 The normal form theorem holds for N'. Proof. The proof, which I will just sketch, is adapted from the one for NJ to be found in Prawitz's monograph. 10 Notice that, despite the classical character of the rules for N', it is the proof for NJ not NK which adapts to the present case, and that it is simplified rather than complicated in the process. The reduction steps for TV'-deri vat ions are as expected. They include _L-reductions for each of the other connectives, namely, 8
loc. cit. They appeared first in his paper, "The Province of Logic" {Contemporary British Philosophy, 3rd. Series, ed. by H. D. Lewis, London, 1956), and subsequently in the aptly titled The Development of Logic (Oxford, 1962). 10 Natural Deduction, Chapter IV. 1 9
The Problem of Substitution
n
n
i AAB A all reduce to
n
n
±
_L BAA A
71
1 B
Ay B A B
B-*A A
n 1 A A-reduction is the same as for NJ. There are no permutative reductions, and V-reduction takes the following form:
n A JVB A
^
n A
, and
B
n A TVA B
^
n A
A
Finally, the reduction step for —> is
n B A-*B
ir {Ap} B
A
ir yi
n
B
Because the rules of N' apply only to single formula occurrences, there is no need to bother with subscripts except for assumptions, and the right hand figure in the statement of —^-reduction denotes the result of replacing each assumption Ap of II by a copy of the conclusion A of IT shown on the left (together with its derivation). Maximal formulae are defined in the usual way, and to each derivation II we assign a value (m,n), where m is the highest degree of a maximal formula occurrence in II and n is the number of such occurrences of degree ra. The reduction procedure is as follows: eliminate any maximal formula occurrence of highest degree unless it is an implication to which —•-reduction applies; in this latter case eliminate it only if the derivation of the minor premise contains no maximal formula occurrences of highest degree. It is easy to see that any reduction step performed in accordance with this procedure diminishes the value of a derivation—the pairs (m, n) are ordered lexicographically—and that the procedure can always be applied to a derivation which is not in normal form. From this the result follows, n Multiple-conclusion derivations do not have unique normal forms—at
72
Normalization, Cut-Elimination and the Theory of Proofs
least, not relative to the conventional type of reduction step. For example,
n A
Ay B A B W U" C D CAD D reduces to both
n
. n"
and A D by our procedure (assuming that Ay B and C A D are of the same degree). Furthermore, the choice of rule (5) as a formulation of —^-introduction serves to compound the problem, because a single group of assumptions may be discharged by more than one application of this rule. As a result, N' contains derivations of the form
n
ir A
B A^B
{Ap} B
n"
c A->C
,
{Ap} C
A
which may reduce to
ir n
n" n
B,C B,C (assuming that A —> B and A —> C are of the same degree and that 11', II" contain only maximal formulae of lesser degree). These are some of the problems involved in finding a correspondence between normalization in a calculus like Nf and normalization in NJ. None of this is strictly relevant to the matter at hand, however, although I will return to the topic later. Normal derivations have a particularly simple structure—more so even than in NJ. If a branch of a derivation is defined to be any subset linearly ordered by the edge relation which does not pass through (although it may terminate with) the minor premise of an application of rule (6), then every branch begins with a (possibly empty) series of eliminations, ends with a (possibly empty) series of introductions and, in between, (perhaps) an application of rule (7). It is obvious, therefore, that normal derivations have the subformula property. L e m m a 4.6 The formula *(A,B) A *(C, D) is not derivable in Nf where *(X, Y) abbreviates {X V Y) -+ ((X V Y -> X) V (X V Y -> Y)). Proof. Assume A, B, C, and D are all atomic. This makes matters a little
The Problem of Substitution
73
simpler, although it is not essential to do so. If the formula is derivable, it has a normal derivation, whose last rule of inference must therefore be an introduction, A-introduction, in fact. So, consider what form a derivation of *(A, B), say, can take. Again, the last inference must be an introduction, and the derivation will consist of an application of V-elimination followed immediately by introductions. An inspection of the rules reveals that it must look like the following: Ay B A (AvB)-+A (AVB^A)V{AVB^B) *(A,B)
B {AvB)-*B {AV B -» A)V (A V B -» B) *(A,B)
where the assumption AWB is discharged by any or all of the applications of —•-introduction. Similar considerations apply to the derivation of *(C, D). What is interesting about these derivations is that they cannot have fewer than two conclusions. It is apparent, therefore, that however many times A-introduction is applied there will always be a pair of conjuncts left over. Consequently, there can be no normal derivation of *(A, B) A *(C, D), only of *(4, B) A *(C, D), *(A B) or *(4, B) A *(C, £>), * ( C D). • Theorem 1, whose proof is now complete, indicates that there is a difficulty about defining substitution for the derivations of the propositional calculus. If this operation is to be well-defined, derivations which differ only in the order of their construction cannot be distinguished from one another. Yet I have just shown that there is no way to obliterate such distinctions entirely—at least not for any class of derivations sufficient to yield all valid formulae. The difficulty could be avoided, of course, by weakening the properties required of substitution. Unless we are prepared to change our understanding of this operation in more familiar contexts, however, there seems to be little doubt that they are appropriate ones. (This claim becomes even more plausible when substitution is considered in the abstract framework of category theory.) Anyway, although such an expedient would allow substitution to be defined for derivations of the kind considered above, it would leave their structural properties unaffected. In particular, there would still be distinct derivations which differed only in the order of their construction and, as a result, permutative reductions or their analogues would be required for a satisfactory normalization procedure. This circumstance, regrettable though it may be, seems unavoidable given the treatment of derivations and rules of inference adopted above. My aim, however, was to make it as general as possible. Permutative reductions will be needed not just for this particular multiple-conclusion calculus, but for all those systems (like Gentzen's N and L calculi—in both their intuitionistic and classical forms) whose rules can be interpreted as
74
Normalization,
Cut-Elimination
and the Theory of Proofs
generating a sufficiently large subset of its derivations. 11 Derivations are taken to be arrays of formulae structured by a logically significant relation and rules of inference are supposed to operate on their premises in a uniform way, but it is difficult even to imagine what would be entailed by doing without these assumptions. The same cannot be said for the stipulation that the two premises of an application of rule (1) or (6) be from separate derivations. If this restriction is relaxed and, more generally, even the conclusion of an inference is allowed to belong to the same derivation as its premise(s), application can be reduced to a single basic operation: connecting a pair of vertices, whether from the same or different derivations. It then becomes necessary to place some global restrictions on the structure of a derivation to ensure its correctness. This approach to multiple-conclusion logic is discussed extensively by Smiley and Shoesmith, who devote Part II of their book to the issues involved in piecing together derivations in this way. The flexibility gained by such a piecemeal method of construction does facilitate the definition of an associative substitution operation—several of them in fact, but this turns out to be a relatively minor advantage. It is hard to attach much intuitive significance to the joining of individual vertices, especially when they belong to the same derivation. Furthermore the resulting derivations, although they may not strictly speaking allow permutations of inference, do display structural differences which appear to be no more logically significant. (These will depend on whether different applications of rules with the same conclusion connect to one or more occurrences of it.) More importantly, our basic problem remains unsolved because we are looking for a homomorphism (with respect to the logical rules) from a familiar Gentzen calculus to one of this kind which will preserve substitution, and the image of A-introduction, say, under such a mapping must be a derived rule which transforms separate derivations of A and B into a derivation of A A B. Girard accepts the impossibility of finding a homomorphism of this kind. 12 He is led therefore to take a more radical approach to multiple11 Since Lemmas 4.2 and 4.3 depend upon the special form of rule (5), it might seem that the result could be avoided by replacing this rule with a more traditional version of —^-introduction, but this is not so. These lemmas will still hold for derivations in which no application of —•-introduction discharges an assumption in the derivation of one of the premises of an application of V- elimination lying above it, and it is only reasonable to require the structural effect of applying a rule in these special circumstances to be the same as in all others. 12 He too is interested in how to represent the proofs of classical logic by derivations whose inferences cannot be permuted with each other and which can be reduced to normal form without permutative reductions, and remarks in a number of places that it cannot be done. See, for example, page 9 of "Linear Logic" (Theoretical Computer Science, Vol. 50, 1987): "It seems that the problem is hopeless in usual classical logic and the accumulation of several inconclusive attempts is here to back up this impression." The results of the present chapter may also be seen as backing it up.
The Problem of Substitution
75
conclusion calculi, albeit one which may also be described as a piecemeal method of derivation construction. His rules operate by connecting pairs of individual formula occurrences, and the resulting figures are called proofnets if they satisfy a certain global soundness condition. Unlike the situation described in the previous paragraph, however, each occurrence must be the conclusion of exactly one application of a rule (axioms are treated as conclusions of a 0 premise rule) and the premise of at most one. What distinguishes Girard's approach above all is that he abandons the traditional logical vocabulary and studies a variety of novel connectives. In terms of these he is able to characterize fragments of classical logic with nice properties. There is no space here to do justice to his ideas and results, but they do provide further evidence for the interest of derivations with multiple conclusions.13 Another possibility would be to relax the requirement that a derivation must be an array of formulae. There are no doubt a number of ways to do this, but what I have in mind here is to treat them as sets of such arrays—sets of trees, in fact. For example, the derivation [A]
n AyB
[B]
nx n2 c
c
c
is to be interpreted as the union of
n
»°
A
B
"e? "c
with some notation to indicate that the occurrences of A V B shown are not among the conclusions of the resulting derivation, nor A and B among its (open) assumptions. This results in a rather unfamiliar notion of derivation which is difficult to reconcile with the idea of a proof as a determinate procedure for arriving at a conclusion. By removing the connection between the major premise of an application of V-elimination and the assumptions discharged by that application, it would appear that an essential component of the derivation has been lost. Furthermore, some justification is needed for treating A B AAB as an entirely different structure from AVB A B 13
His recent book Proofs and Types (Cambridge, 1989), written with Yves Lafont and Paul Taylor, includes a readable sketch of these ideas. A more detailed account is to be found in his paper "Linear Logic."
76
Normalization, Cut-Elimination and the Theory of Proofs
(After all, if the idea of a structural similarity between proofs and derivations is to be taken seriously, inferences have to be represented in a uniform way unless there is some compelling logical reason for not doing so.) Despite these objections, however, if the derivations of such a calculus could be shown to characterize the equivalence relation on natural deduction derivations generated by permutations of inference, they would at the very least be of formal interest. The impossibility of defining an associative substitution operator on derivations does not vitiate entirely this particular approach to logic since the derivations in question may still be (and indeed are) closed under substitution in the sense that, given derivations of A, A from T and of A' from T',^4, we can find a derivation of A, A' from r , r ' . Nonetheless, it is disappointing that there appears to be no straightforward extension of the negative fragment of NJ which preserves its distinctive combinatorial properties. This has consequences for both of the issues raised earlier. Granted that, whatever notion of proof is captured by NJ, it cannot be one in which the order of (permutable) inferences is important, there now seems to be little hope of representing this feature conveniently in a formal derivation. As for the correspondence between cut-elimination and normalization, had it been possible to exhibit a calculus C which needed no permutative reductions, together with homomorphisms from NJ and L J to C which preserved the proper reduction steps, this would have sufficed not only to establish a correspondence between reduction procedures for these calculi but also to make it plausible that permutative reductions are logically insignificant. Under the circumstances, it seems best to consider equivalence classes of derivations (the equivalence being generated by permutations of inferences) and try to interpret these within the context of a general discussion of the identity of proofs. If it can be successfully argued that equivalence in this sense and interreducibility (factored out by this equivalence) represent significant relations on proofs, it should be possible to make better sense of permutative reductions and reestablish a correspondence between cutelimination and normalization procedures.
5
A Multiple-Conclusion Calculus Before pursuing the line of investigation suggested at the end of the previous chapter, I think it worth considering multiple-conclusion systems of logic in a little more detail. I am less concerned with their intrinsic interest than with the fact that they seem to be the natural analogues of sequent calculi—for classical logic at least. As such, they provide a convenient framework for the comparison of N and L calculi in general, and a treatment of classical natural deduction which is superior to the conventional one. I shall attempt to substantiate these claims below after having first outlined a usable version of natural deduction with multipleconclusions. In the previous chapter, I discussed the relationship between consequence and derivability relative to a set of multiple-conclusion rules—for propositional logic at least—but failed to describe an adequate notion of derivation. In order to do so, I must specify what operation on graphs is to represent the application of a rule of inference. It is apparent that Lemma 4.3 cannot be expected to hold for such an operation, and derivations containing circuits will have to be allowed. I do, however, want to exclude circuits formed by joining a number of occurrences of the premise of a rule (or a number of occurrences of each premise in the cases of rules (1) and (6)) to a single occurrence of its conclusion (or a single occurrence of each conclusion in the case of rule (4)). My objection to this procedure is that its effect is to reintroduce the kind of structural features which prompted the search for alternatives to Gentzen's TV calculi in the first place. Consider for simplicity any one premise rule A R(A)R and suppose that the result of applying R to
n A,B,... 77
78
Normalization, Cut-Elimination and the Theory of Proofs
is the derivation
n R(AY
' •"•
obtained from II by adding a new vertex labelled R(A) below all the bottom vertices of II labelled A. (There is no loss of generality here since it is easy to restrict the application of the rule to fewer occurrences of A by appropriate use of subscripts.) Now, the obvious mapping from NJ derivations to multiple-conclusion ones constructed by rules which operate in this way, call it T, is clearly an isomorphism between its range and the derivations of NJ. So, for example, the difference between
n CV D
[C] [D] rii n 2 A A A
and
[C] ni A n CV D R(A)
R(A)
[D]
n2
A R(A)
R(A)
will be reflected exactly by the difference between the multiple-conclusion derivations CVD C
CVD D
.F(ni) T(u2) A A R(A)
C
and
D
^(no A R(A)
T(ii2) A R(A)
(It is conceivable that there is some alternative to T which would avoid these consequences, but it is hard to imagine what it would look like. On the whole, this particular line of inquiry seems not worth pursuing.) These considerations lie behind the development which follows. Since (4) is the only rule with more than one conclusion, a derivation having exactly two conclusions, both labelled ^4, can be represented by a figure of the form
nx CVD n = c D n2 n3 A A where IIi,Il2 and II3 are all single-conclusion derivations. II specifies two ways of reaching the conclusion A, depending on how the disjunction C V D is decided. (C V D is decided when IIi, or the result of substituting derivations of the appropriate kinds for the assumptions of IIi, contains a proof of C or of D in the sense that it reduces to a derivation whose last inference . C D , lS —.D °r C - D - }
A Multiple-Conclusion Calculus
79
Similarly, a derivation having exactly three conclusions, all labelled B, can be represented by a figure of the form
ni
ni
EVF E W
n'2
=
EV F E F
n;
or
GVH G H
n2
n"
B
n3
G\/H G H
K n'5
B B B B Suppose now that we want to derive A A B from II and II'. The resulting derivation should specify six ways of reaching the conclusion A A B, depending upon how the disjunctions C V D , E\/ F and G V H are decided. These can be represented in a single derivation by taking three copies of II and two of II7, and joining their conclusions to six new vertices labelled A A B in the following manner. (5.1) CVD C D
ni CVD C D
rii CVD C D
A
A
EWF E
n2 n 3 n2 n 3 n2 n 3 A
AAB
A
A/\B
A
AAB
A
AAB
ni F
EWF E
F
n2
n3
n'2
n^
GV#
B
GVif
B
AAB
AAB
This is not simply an arbitrary arrangement. It is designed to ensure that, no matter how each disjunction in II and II' is decided, there will always be at least one way to reach the conclusion .AAB. (Of course, a disjunction can be decided in only one way in all copies of a single derivation.) Furthermore, in view of the preceding discussion, each such way should be represented by a different path. If we add the (quite reasonable) requirement that each one should be represented by at most one path, then the above arrangement is the only possibility.
80
Normalization, Cut-Elimination and the Theory of Proofs
Guided by this example, I define below an operation of combination on graphs. The application of a rule of inference will then be interpreted as the combination of graphs, one of which has a special form. So, the above derivation is obtained by combining II and II' with the graph A B AAB (This notation is explained in the next paragraph.) In the case of a onepremise rule, for example (3a), its application to U should result in a derivation of A \JB obtained by adding two new vertices with this label, one below each occurrence of the conclusion A. Such a figure can be obtained by combining II with the graph A AVB Before proceeding further I need to introduce a few conventions: (1) I will write, for n > 0, A!... An A for the graph comprising n vertices labelled A\, . . . , An, respectively, which are joined to a single vertex labelled A below them. A Al...An is the graph obtained from Ax... An A by reversing the direction of the edge relation. Finally, I will use A to denote the graph consisting of a single vertex labelled A. (It should be obvious from the context when A is being used to denote a formula, and when a graph.) (2) As mentioned earlier, formulae occurring in a derivation will be assigned natural numbers as subscripts. For the present, these subscripts are to be considered part of the formalism of the calculus. The use of subscripts is simply a bookkeeping device. It corresponds to the use of sequences of formulae on the left and right of a sequent, and generalizes the idea of equivalence classes of assumptions which is routinely employed nowadays in the treatment of natural deduction. When derivations are regarded as instances of valid argument forms which can be combined together to produce new forms, subscripts make it possible to preserve distinctions which would otherwise be lost. They serve as place holders in much the same way as variables do in the usual notation for functions and terms. There is also a strong reason to use subscripts if one is interested in the strong
A Multiple-Conclusion Calculus
81
normalization theorem since this is known to fail for the version of NJ (in fact, even for its pure implicational fragment) in which either all assumptions of the appropriate kind are discharged by an application of —•-introduction or none are. Subscripts are the means of distinguishing between different occurrences of a particular assumption when it is desirable to do so. Although not absolutely necessary, matters are simplified if a graph comprising a single vertex is labelled by a formula with a pair of subscripts, its subscript as an assumption and its subscript as a conclusion. The advantages of this modification are twofold. It facilitates the comparison of multiple-conclusion calculi with the familiar Gentzen ones, and it simplifies various definitions below by eliminating the need to treat one element graphs as special cases. I will write the subscript as an assumption above the subscript as a conclusion so that, for example, the one element graph labelled by A whose subscript as an assumption is i and as a conclusion is j will be denoted by Aj. A one element graph labelled in this way will be called an axiom. It is convenient to be able to describe the axiom A1- as having a top vertex labelled Ai and a bottom one labelled Aj. I propose to adopt this manner of speaking henceforth, even though it suggests erroneously that the graph in question has at least two vertices. (3) Any axiom and any directed graph with at least two vertices, each of which is labelled by a subscripted formula, will be called a quasiderivation. The labels of the top vertices of a quasi-derivation are called assumptions, and those of its bottom vertices conclusions, but I will also use these terms to refer to the vertices themselves. Again, it should be obvious when a formula occurrence is meant, and when a vertex. (4) Graphs which are identical except for their vertices are said to be copies of one another. There is no need to distinguish between different copies of the same quasi-derivation, and I will always assume that distinct graphs and distinct copies of the same graph have disjoint set of vertices. (5) If II is a quasi-derivation which is not an axiom, I will write (^4^)11 for the quasi-derivation obtained from II by using Ai to relabel all assumptions of the form Aj. Similarly, U(Ai/j) is the quasi-derivation obtained by using Ai to relabel all conclusions of the form A3. If II is an axiom, say C^, (Ai/j)H is C^ if C = A and n = j , and II otherwise. Similarly, U(Ai/j) is Cf if C — A and m = j , and II otherwise. Anticipating the comparison with LK, I will describe (Ai/j)U as having been obtained from II by left contraction and U(Ai/j) by right contraction.
82
Normalization,
Cut-Elimination
and the Theory of Proofs
Definition 5.1 Given Ak, let II be a quasi-derivation with m bottom vertices labelled Ak and II' be a quasi-derivation with n top vertices labelled Ak,(m,n > 0). Suppose also that these vertices are enumerated in some arbitrary way, and let I I I , . . . , II n be n copies of II and U[,..., 11^ be m copies of IT. [II, j4fc,n/], the result of combining the conclusions Ak of n with the assumptions Ak of IT, is defined as follows: (1) If n is an axiom A[, [II, Ak, IT] = {Ai/k)Il'. (2) If IT is an axiom A\, [II, Ak, IT] = Tl(Ai/k). (3) If neither II nor IT is an axiom, [II, Ak, IT] is the graph obtained from the union of I I i , . . . , II n ,!![,... 11^ by identifying the vertices V(p^ and y(p'q) for each p, q (1 < p < n, 1 < q < m), where V(p,g) is the qth bottom vertex of Up labelled Ak and v^p^ is the pth top vertex ofn^ labelled .4*. It is easy to verify that, because copies are not distinguished from one another, [II, a, II'] does not depend upon the particular enumerations chosen for the conclusions a of II and the assumptions a of IT. Also, suppose for a moment that combination has been defined for graphs labelled by unsubscripted formulae as well, then figure (5.1) above can be written as
n',s,
TI,A
A B A/\B
(provided that B does not occur among the assumptions of II) or n,yi, IT,JS,
A B] AAB
(provided that A does not occur among the assumptions of II'). The next task is to explain what it means to be a derivation in this calculus. For this I need some additional notation and terminology: (1)
a. Two quasi-derivations are said to be congruent if they are obtainable from one another by a one-one mapping T between labels which satisfies the condition: For all A, i, T{Ai) = Aj for some j . b. Call a subscript occurrence intermediate if it is not part of the label of an (open) assumption 1 or conclusion. Quasi-derivations which are identical once all intermediate subscript occurrences have been deleted are said to be almost alike. I would like to be able to claim that derivations which are almost alike or congruent (i.e., related by the transitive closure of the union of (a) and (b)) are indistinguishable. As will emerge, however, only derivations which are both can be identified. Such derivations are
A formal distinction between open and closed assumptions is drawn below.
A Multiple-Conclusion Calculus
83
said to be alike, and in what follows a derivation will be considered well-defined if it has been specified uniquely up to likeness. (2) The quasi-derivations II and II' are said to be compatible if no subscript with an intermediate occurrence in II occurs anywhere in II', and vice versa. For the remainder of this work, I will tacitly assume that all quasiderivations II satisfy the following condition: no subscript which occurs on an assumption or conclusion of II has any intermediate occurrences in II. This guarantees that, for any II and II', it will always be possible to find a pair of mutually compatible quasi-derivations which are like them. (3) I will write
n
ii Bk for n*(An/i),An
Bk where n occurs nowhere in II*, and II* is a quasi-derivation like II which contains no intermediate occurrences of k. Similarly,
n Bk
Cj
will denote
|n**(^n/i),An,^An n c j where n occurs nowhere in II**, which is like II except that it contains no intermediate occurrences of k or j . (4) Let 111 and n 2 be quasi-derivations with conclusions of the form A{ and S j , respectively. I will write U
as
\An/i),An,
A {£>m/j)i m, n**(n ^ R
11
n
n
X-Y
Ft 1 m
^
\
J-
ni n 2 Ai
B3
Ck where II* and II** are like III and n 2 , respectively, except that neither contains an intermediate occurrence of k and they are compatible with one another; furthermore, m and n are distinct subscripts which occur nowhere in II* or II**.
84
Normalization,
Cut-Elimination
and the Theory of Proofs
(5) The notations Bk A{
Bk
n
Cj
Az
Ai
Bk
n
ni n2
Cj
are dual to the above—the duality being between assumptions and conclusions, and as such do not require separate explanation. Granted that the application of a rule is to be interpreted in terms of combination, there are still a number of different ways to read the axioms and rules of Chapter 4 as the clauses of a definition of (propositional) derivation, depending upon how these rules are to be applied. It is most natural to think of them as being applied downwards so that the definition becomes: Axioms: For all A, n and m, A7^ is a derivation of Am from An? Rules: (1) If II is a derivation of A, Am from T and II' is a derivation of A', Bn from r", then
n
ir
AABP is a derivation of A, A , A A Bp from T, Ff for all p. (I do not intend to exclude the possibility that A A Bp is already a member of A or A'.) (2a) If n is a derivation of A, A A Bp from T, then 7
n AABP is a derivation of A, Am from T. (Again, Am may be a member of A.) With the possible exception of rule (5), it should be obvious how the remaining clauses are to be formulated, so there is no need to list them here. As for (5): (5) Let II be a derivation of # m , A from T, then
(4,/»)n Bm A->BP
M
is a derivation of A —• £?p, A from T — An, where q is a subscript (distinct from p) which occurs nowhere in II, and A —• Bp ^ is the label A —> Bp augmented by some notation which indicates that any 2 I have changed the form of the axioms slightly so that they will satisfy the definition of quasi-derivation given above.
A Multiple-Conclusion Calculus vertex so labelled discharges all assumptions of the form Aq. write
85 I will
n Bm
A-*BP
{An}
for
(Aq/n)n Bm A->BP
<«>
Two remarks: (1) It is apparent that the derivability relation characterized by this set of derivations coincides with the consequence relation determined by the rules of Chapter 4. (2) If no restrictions had been placed on subscripts, or indeed if subscripts had been omitted entirely, the result would have been equally satisfactory from the point of view of derivability. I mention this to emphasize that the complications these involve have less to do with the multiple-conclusion approach than with the use I want to make of it—in particular, to study normalization and compare the derivations of different calculi. Formulations of more conventional systems suitable for these purposes would be no less complicated. As mentioned above, there are other ways in which the class of derivations might be defined. One possibility is to apply the rules upwards. A slight complication arises because rule (5) will now allow closed assumptions to be introduced into a derivation. Furthermore, when this happens, it is inconvenient to insist that there are no other closed assumptions of the form in question already present in the derivation. For the purposes of this paragraph, therefore, let the subscripts on closed assumptions not be classified as intermediate. (Of course, this alters slightly the meaning of compatibility, as well as the various notations explained in terms of it.) With this proviso, the clauses of the definition are easy to state. For example, clause (2a) reads: If II is a derivation of A from T, Am then
AABp
n is a derivation of A from F if every occurrence of the assumption Am 3 It is not hard to make this description more precise and to spell out a particular labelling procedure, but the above should be sufficient. Also, it goes without saying that the definition of quasi-derivation is now modified to include labels of this kind, and that an assumption is closed if it is discharged by some vertex.
86
Normalization, Cut-Elimination and the Theory of Proofs in II lies above a vertex labelled (A A B) —• Cq ^ and from T, A A Bp otherwise.
(for some C, ),
Similarly, clause (5) reads: If II is a derivation of A from r , A —> Bp and n appears neither as a subscript nor as an annotation in II, 4 then Bm A-+Bp
<»)
n is a derivation of A from T if A = B and n — m, and from T, # m otherwise. It is a routine matter to write out the remaining clauses. The class of derivations obtained in this way is not identical with the previous one, although the two are deductively equivalent. Except from a heuristic point of view, the above method of constructing derivations is perhaps little more than a curiosity. Of more interest is the class of derivations which results from applying the introduction rules, i.e., rules (1), (3) and (5), downwards and the elimination rules, i.e., (2), (4) and (6), upwards. By closing this class under an operation CUT such that the result of applying CUT to
r n A,Ai and AiX IT A' is a derivation of A, A7 from T, V whenever all occurrences of Ai as an assumption in IT' are minor premises of an application of rule (6), a natural interpretation for the logical rules of (the propositional part of) LK can be obtained. 5 No matter how CUT is defined, this class of derivations, call it ND, is not co-extensive with either of the other two. The cut-elimination theorem, however, ensures that it is deductively equivalent to them. To give a more complete account of the relationship between Gentzen's calculi and the above, quantifiers have to be considered and a substitution operation appropriate to multiple-conclusion derivations defined. The latter presupposes that the set of such derivations has been fixed. For this reason I will not pursue the idea of the upward application of rules any 4
Weaker restrictions on n are possible, but these are convenient. This is so once some provision has been made for negation. For the version of LK found in Appendix B it is enough to require that rule (7) be applied only to axioms. More traditional formulations of the sequent calculus are best handled by introducing additional rules corresponding to the left and right rules for this connective. 5
A Multiple-Conclusion Calculus
87
further here, and will subsequently propose an alternative interpretation for left rules. Turning now to quantifiers, there is no doubt about what form their rules should take, namely: (9Q) A(b)n (10 g ) QxA(x)n QxA(x)m A(b)m where Q is V or 3. The only questions concern the restrictions to be placed upon them (apart from the obvious one that x be free for b in (9^)). Briefly, in some suitable sense b must be arbitrary in (9V) and new in (10 3 ). Also, the inference from V3 to 3V must be blocked. This is usually accomplished by placing some rather cumbersome restrictions on the rules in question, which in the present case would need to be supplemented by restrictions on some of the other rules. 6 It seems preferable, therefore, to adopt a slightly different approach. Recall that the vocabulary of our language C includes both variables and parameters, and that variables can only occur bound in a formula while parameters can only occur free. (C is assumed to be countable.) Expressions which are like formulae except that they may contain free variables are called quasi-formulae. I will write V for the set of variables of C and P for its set of parameters; V 0 P = 0. As before, a, 6, c , . . . will range over the members of P , but I will now use x,y,z... to range over V U P and reserve v,vf,vi,... for V. Now, let U and E be the sets of quasi-formulae of C whose principal connective is a universal and existential quantifier, respectively; let / : E U U »-> V, and consider the following rules. 7 6 T h e rules I have in mind are the two premise ones, and the need to place restrictions on them is a result of the fact that derivations here are not written in sequence form. For example, the requirement that b occur nowhere in the derivation of the premise of an application of (10 3 ) is not sufficient to prohibit the invalid combination
3xA(x)m 3xB(x)p A(b)n B(b)q A(b) A B{b)r To exclude this, rule (1) must not be applied to premises containing occurrences of the same parameter when at least one such occurrence in each premise comes from an application of (10 3 ). 7 For the record: (1) In (9 V ), A(v') is the result of replacing every free occurrence of f(VvA(v)) in A(f(VvA(v))) by v'', where (unless v' = f(tivA(v))) v' does not occur in
A(f(VvA(v))). (2) In (9 3 ), v' does not occur in A(x) (unless v' = x). (3) In (10 v ), A(x) is obtained from A(v') by replacing every free occurrence of v' by x. Also, if x is a variable, it is free for v' in A(v'). (4) In (10 3 ), A(f(3vA(v))) is obtained from A(v') by replacing every free occurrence of v' by f(3vA(v)). (Condition (2) on / below ensures that, unless f(3vA(v)) = v', f(3vA(v)) cannot occur in 3v'A(v').)
88
Normalization, Cut-Elimination and the Theory of Proofs
v iy )
(9 B)
A{f{VvA{v)))n WA{v')m
_ ^ _ 3v'A{v')m
v UU
3* v( 1 0 y
)
WA(v% A(x)m
* ^ V A(f(3vA(v)))m
By placing some simple conditions on / and adjoining these rules to the propositional ones given earlier, a formalization of classical predicate logic is obtained which is adequate in the sense that, if T and A are sets of formulae, there is a derivation of A from F iff A\ T —> W A is classically valid. The following conditions suffice for this purpose: (1) / i s one-one. (2) There is an enumeration {va)a<(3 of Range(/) such that, if /(^4) = v7 and va occurs in A, then a < 7. (1) ensures that f(VvA(v)) is suitably arbitrary and that f(3vA(v)) is sufficiently new, whereas (2) rules out the possibility that, for some A{v,v\), v', v", f(3vA(v,v')) = v" and f(VviA(v",vi)) = v'—thus blocking the inference from V3 to 3V. The idea is that f(3vA(v)) can be interpreted as an individual which satisfies A(x) and f(\/vA(v)) as one which fails to satisfy it, whenever such individuals exist. More precisely, if (M,a) is any model for £, it can be expanded to a model ((.M, (v Q ) Q<j g), a), where /3 is given by (2) above, for the language obtained from C by treating the members of Range(/) in their free occurrences as constants. I take a model for £ to be a structure M = ( M , . . . ) of the appropriate similarity type together with an assignment a : P *-+ M. Also, for convenience, I assume that all models have the natural numbers as their domain. The £ a 's are defined by induction on a (a < (3) as follows: Suppose that f(A) = va(1) If A is of the form 3vB(v) and there exists m € M such that ((A*,(v 7 ) 7
\= B(c) ->-L [m]
where c does not occur in VvB(v), let va be the least such m. (3) In all other cases let va = 0. (M,a) \= B(c) [m] means that (M,a!) (= B(c), where a' is the assignment which is like a except that it takes c to m. The rules are obviously sound under this interpretation and any model for C can be expanded in this way. The adequacy of the rules therefore
A Multiple-Conclusion Calculus
89
follows. It is also worth mentioning, perhaps, that a function satisfying the conditions specified above is easily defined for any £; in fact, it can be constructed by a routine adaptation of the technique used in the Henkin completeness proof to extend a consistent set of sentences to a saturated one. 8 As with rule (5), in the presence of conclusions other than those operated upon by the rule, (9V) is not intuitionistically valid. For example, it allows us to derive nnvx 1
I "' l ; v [6
}
\/v(A(v) V B)m A{f(WA(v')))VBn A(f(WA(v')))p Bq
,
\/vA(v) V Bs
Furthermore, even in the absence of such additional conclusions, (103) and (5) taken together lead outside the confines of intuitionistic validity. This can be seen from the following derivation:
(5.2)
Bm B -> 3v'A{v')n (6) 3, 3v'A(v% (10 3 ) ^ A(f(3vA(v))) J B -> A(f(3vA(v)))r {Bm} } ^ 3v'(B - A{v'))8
It seems reasonable to claim that (10 3 ) itself is intuitionistically valid and that rule (5) is problematic. Certainly, when the rules are applied downwards, the only way to obtain intuitionistic logic seems to be by restricting the latter—and, of course, (9V)—rather than (10 3 ). A purist might argue that the individual asserted to exist on the basis of an intuitionistic proof of 3vA(v) depends on the derivation of this conclusion and not simply on its form. Even a version of (10 3 ) modified to meet this objection, however, would not prevent the derivation of 3v(B —• A(v)) from B —• 3vA(v). It may seem anomalous that a rule which has traditionally been claimed to express the meaning of intuitionistic implication turns out to be invalid. This is so, however, only when it is applied to
This interpretation notwithstanding, I should emphasize that / is just an auxiliary syntactic device. It is not even part of the vocabulary of £, and certainly not a logical symbol comparable to Hilbert's e-symbol. 9 Notice that, if the definition of ND is extended to include rules (9^) and (10^), the former being applied downwards and the latter upwards, there is no need for / or for the use of quasi-formulae. It is sufficient to require that the parameter generalized in (9 V ), or introduced by (10 3 ), not occur in any other assumptions or conclusions of the derivation.
90
Normalization, Cut-Elimination and the Theory of Proofs
In the light of my earlier remarks on the subject, the definition of substitution should come as no surprise. The only questions will concern its properties. Given any graph Q, let c{Q) be the cardinality of its set of vertices. Furthermore, let D be the class of derivations generated by rules (1)-(10^) when they are applied downwards and their application is interpreted in terms of combination in the manner outlined earlier. Definition 5.2 Let II be a quasi-derivation with conclusions of the form Ai and let II' be a member of D, [U/Ai]Uf, the result of substituting U for each occurrence of the assumption Ai in IT, is defined by induction on c(II / ) as follows: Case 0:
;
c(II') = 1
ifn = i4i, [n/^ii'is n. If IT 7 ^ , [n/Ai]TV is IT. Now suppose c(lT) = n (n > 1) and that [ I I / ^ ] ! ! " has been defined for all II" such at c(II // ) < n. There are a number of cases to consider, depending upon the form of II', but these fall into four groups. Cases 2, 3, 7, 9^ and 10^: IT is of the form IT
zr~ R where R is any single-premise, single-conclusion rule except (5). Let p be an index which occurs nowhere in n or II", then
[nMt](n"(cp/n)) \a/Ai]n' = Case 4:
cp
II' is of the form
n" C\tDn This is just a notational variant of the preceding cases. Cases 1 and 6: IT is of the form iii n2 D where R is rule (1) or (6). Let r and s be indices which occur nowhere in II or 11', then
[nM,](n2(c,/n))
rnM<](n1(flr/m)) [U/AijW =
Br Dq
A Multiple-Conclusion Calculus Case 5:
91
II' is of the form
n" Cn B —• Cm {Bp} where p occurs nowhere in II and only on assumptions of the form Bp in II". Let q be an index distinct from p which occurs nowhere in II or IT, then \n/Ai](TL»(Cq/n))
[n/Ai]n' =
cq B —> Cm
{Bp}
The foregoing definition may fairly be claimed to express the meaning of substitution in the context of multiple-conclusion derivations. Nevertheless, it might be thought that substitution could simply have been identified with combination. D, however, is not closed under arbitrary combinations. It is, of course, immediate that D is closed under substitution as defined above. Furthermore, if n is a derivation of A, Ai from T and IT is a derivation of A ; from T\Ai, then [ I I / ^ i T is a derivation of A, A' from r , H . It only remains to show: Theorem 5.1 The operation of substitution is well-defined. Proof. The problem here is that the various cases of the definition are not exclusive. They are easily seen to be exhaustive. The only doubt that may arise is on account of Case 5. Recall however that, if IIi A->Bm
{Aq}
and p is any subscript not occurring in II, then (Ap/qWx
n=
An A
• JDm
\Ap)
as well. In other words, if II is obtained by an application of rule (5), it can always be written in the form required by Case 5. The proof is by induction on c(lT), utilizing the fact that combination satisfies
(5.3)
p i , A,,n2],B7,n3] = [[ni,B„n3],A„n2]
provided that Ai is not among the conclusions of II3, nor Bj among those of n 2 . 1 0 The basis, c(II') = 1, is trivial. As for the induction step, there are Cf. the remarks following the definition of combination.
92
Normalization,
Cut-Elimination
and the Theory of Proofs
ten kinds of case to consider—some divided into subcases—corresponding to pairs of the four kinds of case in the definition. By way of example, I will sketch the case in which both representations of II' fall under the third of the four inductive clauses of the definition, i.e., the case in which
n' =
nx n2 n3 n4 B^ c_ = £ F_ D
G
(I omit subscripts to simplify the notation.) There are four subcases to consider.11
n"
Subcase 1: For some „ „ o hi
n" n4
ux n"
U2 = C E F G
and
n3 = B
C E D
n"
Given a derivation ~ „ let G , hi
~ = C
[n/411" [u/A]u4 E F and G
,-, = E
\n/A]U! [u/A]u" B C D
11
The point to be emphasized is that these are the only possibilities. It was to ensure this that I insisted on distinguishing between derivations which are almost alike and did not simply identify the application of a rule with combination. If IT is thought of as having been constructed by two different sequences of operations, say (ai)i
=
iii Ai
n2 B±
=
Ck
n3 Ai
n4 Bj_ Ck
then there is no 11^ such that
/n3 n 2 = \Ai
n^\ Bjfinl(B
Iii = II3, and II4 = Xl'^(Bjjn). It is straightforward to check that, had cases of this kind not been ruled out, substitution would not in general have been a well defined operation.
A Multiple-Conclusion Calculus
93
Then,
rix n 2 \a/A]_B
[n/A}Ux [u/A]n2
C_ =
_B
C_
L)
D
[n/A^x *(n") B
c
D 9(11")
[U/A}Tl4
E
by (5.3)
F G
n 3 n4 F
[n/A}E
G
n" Subcase 2: For some
R
„
n" n 4 ni = B E
n" n2
F
and
U3 = E B
G
C D
n" Subcase 3: For some „
R
n 3 n" ni = E
n" n2
F B
and
n4 = F B
G
C D
n" Subcase 4: For some „ „
n 3 n" n2 = E
nx n"
FC
and
G
IL, = B
C F D
These three are just notational variants of Subcase 1, and there are no additional complications associated with the remaining cases. • Finally, notice that substitution satisfies the following conditions:
(5.4)
[[u/AijnjBj]^ = [ii/Ai] ((n1/5>]n2)
provided that Bj is not among the conclusions of II nor Ai among the assumptions of II2, and (5.5)
[U/AillUjBj]^
=
[U1/Bj][Il/Al]U2
provided that Ai is not among the assumptions of II1 nor Bj among those of II. It is an easy matter to prove both (5.4) and (5.5) by induction on
c(n2).
94
Normalization,
Cut-Elimination
and the Theory of Proofs
Suppose now that Ai ^ Bj and that m,n occur nowhere in II, III or n 2 . I will write
n ni Ai
B3
n2 for [ n ( > l n / t ) M „ ] [ n 1 ( S m / j ) / 5 m ] ( ^ „ / i ) ( 5 T O / j ) n 2 . In view of (5.5) above, the latter can be described as the result of simultaneously substituting the conclusions Ai of II and Bj of IIi for the assumptions At and Bj, respectively, of II2. This notation generalizes in the obvious way to
ni
nn
4\" X n which denotes the result of simultaneously substituting the conclusions A% of Ilfc for the assumptions A\k of II (1 < k < n). I will also write
n A%
Bj
rii
n2
for [[U{An/l)(Bm/j)/An}(An/i)n1/Bm}{Bm/j)n2. to claim that
W/AAnjBjfo
I would like to be able
=
[[U/B^/A^,
provided Bj is not among the conclusions of 111 nor Ai among those of n 2 , and to describe
n A%
Bj
ni n2 as the result of simultaneously substituting the conclusions A% and B3 of II for the assumptions Ai of III and Bj of II2, respectively. Unfortunately, however, the latter is not in general equal to
n Bj
Ai
n 2 ri! This notation is, therefore, intended to represent consecutive, rather than simultaneous, substitutions—the order of substitution being indicated by the left/right order of the conclusions of the derivation being substituted. I will also sometimes write [ n / ^ j n ' as
n ir (The notation introduced above for substitution coincides with that for upward application in those cases where the derivation being substituted
A Multiple-Conclusion Calculus
95
has the form of one of the rules. For example,
AABm
n can be read either as the result of substituting the conclusions An of
for the assumptions An of II or as the result of applying rule (2a) upwards to the assumptions An of II. To resolve this ambiguity, I specify that henceforth the former interpretation is always the intended one.) Now that substitution has been defined, it is possible to establish the relationship between members of D and the derivations of the familiar Gentzen calculi, at least for those versions of them which do not involve thinning. In the case of the N calculi, this means excluding applications of V- and 3-elimination which do not discharge an assumption in the derivation of each minor premise. (That such applications involve a thinning procedure will be argued below.) Thinning is a troublesome feature of Gentzen's calculi which affects properties involving normalization and normal forms more than derivability.12 It seems best, therefore, to postpone discussion of this rule until we consider normalizability. I begin by considering NJ with a view to interpreting its rules in such a way that they generate a subclass NJ& of D. This, in turn, induces a structure preserving map between the derivations of NJ and NJE>. It should be obvious how to proceed except for a couple of points of detail. One of these has to do with subscripts. If they are included as part of the formalism of NJ, there is no problem. If they are not, however, each derivation of NJ must be associated with a subset of D whose members are congruent. Unfortunately, this relation is not really a congruence with respect to any 12 In the case of the L calculi, derivability is unaffected by the omission of thinning if the rules are formulated after the manner of Chapter 2 above, rather than Chapter 1, and the negation rules are replaced by axioms for _L. The absence of thinning, however, complicates somewhat the proof of the cut-elimination theorem since it requires the reduction steps to be supplemented by a pruning operation on derivations. A similar complication arises in the proof of the normalization theorem for TV if empty assumption classes are not allowed in applications of V- and 3-elimination. It was perhaps for some such reason that Gentzen included the rule, although his motivation in the case of TV might equally well have been to treat all the rules which discharge assumptions in a uniform way. (If —•-introduction is restricted so that it must always discharge an assumption, it will still be deductively equivalent to the usual formulation of the rule in the presence of the rules for conjunction. The restricted rule, however, spoils both the normal form theorem and the separability property.) I take "derivable from T" to mean "there is a derivation with assumptions from among the members of T." If it means "there is a derivation with assumptions T," it still remains unaffected by the absence of thinning (for the calculi with which we are concerned) although other nice properties, like separability, will fail.
96
Normalization,
Cut-Elimination
and the Theory of Proofs
operation, such as substitution or applying a two-premise rule, which combines two or more derivations. In the case of such operations, therefore, it is necessary to specify the representative from each congruence class to which they are to be applied. 13 There is no particular difficulty involved in doing this. So, for the purposes of the comparison, it makes little difference whether subscripts are included as part of NJ or not. The other point concerns the proper parameter in an application of Vintroduction or 3-elimination. There are two possibilities. The first is to allow quasi-formulae to figure in NJ derivations and modify the restrictions on these rules to ensure that the proper 'parameter' is always an appropriately chosen free variable. The second is to exclude quasi-formulae and introduce a one-one correspondence g between parameters and variables. The rules can then be left unchanged except for the requirement that the proper parameter of an application of 3-elimination, for example, with major premise 3vA(v) must be g(f(3vfA(v'))) for some v'. (A similar remark applies to V-introduction.) Since matters have been arranged so that there will always be infinitely many parameters available for each such application, little if anything is lost by this restriction. Nothing much depends on which alternative is chosen, but I prefer the second and adopt it below. With the exceptions of V- and 3-elimination, the schematic descriptions of the rules for NJ correspond to my notation for the application of a rule in multiple-conclusion logic. Hence, they can be read ambiguously as generating the conventional tree derivations or members of D. As for Vand 3-elimination, their application is interpreted as follows:
n
[M [Bk]
nD Aw
nnx
J
p
nn 2
^q
corresponds to
^q
AW Bp Ai Bk 111 Lq
a
<7
112 Gq
and n -, ,, N 3vA{v)p 13 A
n
Wfl(x))J
3vA(v)p
iii „ Cr
corresponds to y
A(x)a ^/q Cr
similar procedure is necessary in NJ itself. For example, if each of
n
, IT
A and B contains an assumption class involving C, some way is needed to indicate whether these two classes are to be amalgamated or kept distinct in
n ir A B_ AAB
A Multiple-Conclusion Calculus
97
(where x = f{3v'A(v')) and Ili is obtained from 111 by replacing all occurrences of g(x) which are linked to those in assumptions of the form A(g(x))q by x).14 I write NJQ for the class of derivations generated by the rules of NJ interpreted as instructions for the construction of multipleconclusion derivations. There is then an obvious isomorphism between the derivations of NJ and the members of NJp (or congruence classes thereof— depending upon whether subscripted formulae are allowed to appear in NJ derivations). If almost alike members of NJ& are identified, this becomes a homomorphism from the derivations of NJ onto those in NJp. I shall have more to say about these correspondences later. I turn now to LK; there is no need to give a separate treatment of LJ since it is simply a special case of LK. As before, everything is very straightforward except for a couple of points of detail. These are for the most part concerned with the formulation of LK to be chosen. If sequents are taken to be of the form T h A, where T and A are sets of subscripted formulae, and the negation rules are replaced by axioms for _L, the rules of LK are easily interpreted as generating members of D. (Again, there is a small point to be considered concerning the proper parameter of an application of V-right or 3-left. The situation is exactly analogous to the case of NJ, however, and I propose to deal v/ith it in the same way.) I will sketch such an interpretation for this formulation, and then indicate very briefly how it can be adapted to other versions of the sequent calculus. There are two kinds of axioms, namely Ai h Aj
and
_Ljh Aj
for all A, i and j . These are interpreted as the derivations A)
and
-^
respectively. As for structural rules, interchange is redundant and thinning has been excluded for the time being; cut corresponds obviously to substitution, and contraction is just a special case of resubscripting an assumption or conclusion. (It is clear that D is closed under such resubscripting in the sense that, if II G D, then (Ai/j)W and W(Ai/j) are both in D for some IT which is like II. In case this is properly an instance of contraction, i.e., in case Ai is already among the open assumptions or conclusions of II, IT can be taken to be II.) Right rules are handled in the same way as the introduction rules of NJ, so it only remains to consider the left rules. To ensure that D is closed under these, a way must be found to interpret 14 Let a and {3 be occurrences of a parameter or free variable in a derivation {i.e., of the same parameter or variable), then a and (3 are said to be linked if (a,/?) is in the least equivalence relation 1Z satisfying the condition:
(a,/3) £ K iff the formula occurrence which contains /3 lies immediately below the one containing a.
98
Normalization,
Cut-Elimination
and the Theory
of
Proofs
them in terms of downward applications of multiple-conclusion rules. This is accomplished with the help of substitution. I give the examples of Vand —•-left below; it should be apparent how to proceed in the remaining cases:
n
AvBk
rr
Aj,r\-A
Bj,V
AvBk,r,r
\-A'
Ai
corresponds to
Bj
ir
n A
h A,A'
r
A'
and
n
r h A, Ai
BhV
r n
IT
h A'
corresponds to
A
Ai
,4->£fc,r,r'hA,A'
A
u~^
Bk
Bj
r,
IT A'
Let LKD and LJD denote the classes of derivations generated by the rules of LK and LJ, respectively, when interpreted as above. No real difficulty arises if sequents are taken to be of the form 6 H ^ , where 0 and \£ are sequences of unsubscripted formulae, and rules of interchange are added to LK. A sequent in this sense can be associated with each member of D as follows: Given an enumeration e of the formulae of £, define Ai > Bj
iff
i > j , or i — j and A comes after B in e.
For U € D, replace all free occurrences of variables in the assumptions or conclusions of II by their images under g~l, then place the formulae thus obtained from the assumptions in decreasing order (with respect to > ) to the left of h, and those which result from the conclusions in increasing order to the right. Finally, delete all subscripts. The rules of LK can now be interpreted as generating the equivalence classes of members of D obtained by identifying congruent derivations associated with the same sequent. One need only ensure that a formula occurrence introduced by the application of a rule is assigned a sufficiently large subscript, and allow for the necessary resubscripting in the case of the two premise rules. Interchange is taken care of by one or more changes of subscript. If the axioms for _L are replaced by left and right rules for negation, there are two possibilities. One is to define negation in terms of _L and treat its rules as special cases of the corresponding ones for implication. The drawback to this approach is that LJ cannot be obtained from LK by allowing no more than one formula to appear on the right of a sequent; in addition, sequents of the form 0 \-L,A or 0 h A, _L are needed. 15 The 15 T h e resulting system [when restricted in the manner indicated in the text] is an adequate formalization of classical [intuitionistic] logic in the sense that @ h ^ is provable
A Multiple-Conclusion Calculus
99
other alternative is perhaps more natural. It involves treating negation as primitive and augmenting the multiple-conclusion rules of Chapter 4 by the analogues of ->-left and -<-right. This can be done conveniently, however, only after the introduction of thinning, so I shall omit a discussion of these rules here. I turn now to NK. Whether we follow Gentzen and add as axioms all instances of the law of excluded middle, or adopt Prawitz's classical negation rule, it is easy to extend the interpretation of NJ to NK, as the following shows:
A V (A —>JL)i
corresponds to
A.p
AV{A^
Ay Lm -L« ±)i A^±r {n} AV(A-+±)i
and [Am] AW lp
[A - ± „ ] .n
Ai
^
A —± n
corresponds to
n -u
-Lfc At
{m}
I will use NKID and NK2D, respectively, to denote the extensions of NJ& obtained by means of each of the above; NK& will denote ambiguously NK\D or NK2D- It is clear that NK\D 7^ NK2D> For example, the particular derivation of A V -iAi chosen to interpret this axiom is not in NK2D< and [An] AV ±n Ap
J-q
A —>_Lr{n} As A.p
is not in NKID. Another alternative suggested by Gentzen is to add the double negation rule -1-1,4
iff A 6 —• V ^ ' is classically [intuitionistically] valid, where V is a non-empty subsequence of \t obtained by deleting 0 or more terms of the form _L. If # is never allowed to contain more than one formula, the result is a system of minimal logic. Thus, we can dispense with the axioms for X without adding any new rules or axioms and still characterize, albeit in a slightly artificial way, minimal, intuitionistic and classical logic. Similar considerations would allow us to dispense with rule (7) at the expense of some artificiality.
100
Normalization, Cut-Elimination and the Theory of Proofs
The obvious interpretation of
n (A ->±) - ± i Aj IS
[An] AV±K Aj
ip A->J. g {n}
n (A^±)->±j
in The resulting extension of NJD, however, is just a subset of NK2D' As a matter of fact, none of these extensions seem particularly natural. What seems to be essential to each of them is the use of the configuration [Am] AV1P Ai
JL2A-+±n{m}
in constructing derivations. But the obvious way to do this, while preserving the single-conclusion character of NK, is to admit the rule [A]
hA]
n n' c c c
whose interpretation will be
[A, AV ±p A.i
n
ck
-Lj
A ^l„{m}
n' ck
The resulting extension of NJ&, although obviously a proper subset of LKD, properly includes NK\D U NK2DThe uniform interpretation of Gentzen's calculi in terms of D helps us better understand the similarities and differences between their respective rules. Furthermore, it allows relationships between the derivations of the various calculi to be expressed in a rather satisfactory way. In particular, it is a routine matter to verify (by induction on the rules in each case) that NJD = LJD C NKD C LKD C D . For the propositional parts of LKD and D, this last inclusion can be replaced by an equality. The reason why LKD ^ D is that the restrictions on
A Multiple-Conclusion Calculus
101
(9V) and (10 3 ) are more generous than those placed on V-right and 3-left (or on V-introduction and 3-elimination, for that matter). Clearly, there is no sequent derivation corresponding to A(f(VvA(v)))n WA(v')m or 3v'A(v')n A(f(3vA(v))) m However, even members of D which contain no quasi-formulae among their (open) assumptions or conclusions may not be in LKp. Consider, for example,
ni 3v{A(v)AB{v))n A(x) A B(x)p A(x)q
n2
BvijAdv) A B(vi))m A(x)AB{x)r B(x)a A{x) A B(x)t 3v2(A(2v) A B(v2))u where x = f(3v3(A(v3) A Bfa))) for some V3.
6
Reduction Procedures The interpretation discussed at the end of the previous chapter can, of course, be extended to the various reduction procedures for TV and L calculi. I shall not spell out how this is to be done since it is a purely mechanical matter to translate Prawitz's reduction steps for NJ, for example, into operations on the members of NJD- I shall, however, assume such a translation in the discussion which follows. In addition, the rules for generating D themselves suggest a method of normalization. The possibility seems to exist, therefore, for a uniform treatment of reduction in all five calculi. It is to this topic that I now turn. Suppose U £ D and A is any formula. There is no need to refer to the rules in order to explain what it means for an occurrence of A to be maximal in II. Definition 6.1 An occurrence of A is maximal in a derivation II if there is a subformula B of A such that B occurs as one of its immediate predecessors and successors in II. In other words, a maximal occurrence is one which appears in a configuration of the form B
i
A
I
B (I will not bother with subscripts for the moment.) A reduction step deletes the maximal occurrence of A and identifies the two occurrences of its subformula B. This is essentially what all the familiar reduction procedures accomplish. Unfortunately, there are various complications which obscure somewhat the basic picture. These are discussed in (l)-(4) below. (i) Reduction steps are thought of as operations on derivations. This means, in particular, that the result of applying one to a derivation must 102
Reduction Procedures
103
itself be a derivation. The procedure described above does not satisfy this condition, however, since matters have been arranged so that the application of a rule may involve more than one occurrence of a formula. By itself, there is no reason why this should lead to different reduction procedures for the different calculi—except in the case of NK. (This is because NKp is the only class under consideration which is not closed under all the usual reductions.) In addition, however, reductions are required to preserve as far as possible the local character of the above procedure and this means, in effect, defining them relative to a set of rules: a reduction removes a certain kind of inference step from a derivation, namely all the occurrences of a formula which figure as the premise of a particular application of a rule together with all the occurrences of the conclusion of this application. But different sets of rules do suggest different reductions when the latter are conceived in this way. For example, suppose II, II' G NJp and IT results from II by removing an inference step of this kind while leaving the rest of the derivation intact; even though n and IT must also be in Up, there is no reason to suppose that the relationship between them can be expressed in such simple terms relative to the rules of L J. This dependence on the rules seems to be the major source of the differences between various reduction procedures. (2) Because there are rules with more than one premise or conclusion, the configuration displayed above may occur as part of a larger configuration having one of the forms B
B
B
i
i
\/
C
C
B
A
A
A
A
/\
S\
I
I
\S
B C C B B B Once the vertex labelled A has been deleted and the two vertices labelled B have been identified, the question arises as to what should be done with the vertex labelled C and that part of the derivation connected to it. There are basically two alternatives: one is to delete these as well, the other is to retain them while ensuring that C does not appear as an additional assumption or conclusion in the resulting derivation. For the latter an operation is needed which allows redundant formula occurrences or derivations to be adjoined to a given derivation. Out of deference to tradition I propose to call this operation thinning. Although the most satisfactory procedure would seem to be one which pruned as much of the derivation as possible, there are a number of reasons for adopting the latter alternative, or something approximating to it. In the first place, the inclusion of thinning may actually simplify reduction in a calculus. This is because, if we consider the sequence (or possibly, tree) of
104
Normalization, Cut-Elimination and the Theory of Proofs
operations by which a derivation is constructed, V- and 3-elimination (in NJ and NK), left rules (in LJ and LK) and right rules (in LK) are all such that applications of them may be made redundant by the removal of an earlier step of the construction. So, without thinning, reduction steps cannot operate simply on some initial subsequence (or subtree) of the construction. (In terms of the traditional representations of the derivations of Gentzen's calculi, the situation can be described by saying that applications of these rules which lie below a given inference step may be made redundant by its removal. As a result, a reduction step cannot simply operate on the subderivation terminating with the configuration to be removed, while the rest of the derivation remains unchanged, unless redundant applications of at least some of the rules are allowed.) Considerations of this kind are probably sufficient to explain Gentzen's treatment of thinning. It does not seem to be based on any general principles but, on the contrary, to be rather ad hoc and designed simply to facilitate the proof of the normal-form or cut-elimination theorem. 1 A second reason to allow some thinning is the well-known fact that, in its absence, normal forms are not in general unique. More precisely, they are not unique in any fragment which contains both conjunction and disjunction, as the following configuration illustrates: A Ay B A B C BAC C Although the uniqueness of normal forms is an important desideratum, it did not become one until relatively recently and, therefore, cannot properly be used to explain features of the traditional reduction procedures. Finally, without at least thinning on the right, the cut-elimination theorem will not hold for LK; there will, for example, be no cut-free derivation
of\-A,A-+B. There is no doubt that the treatment of thinning is another source 1
In the case of natural deduction, the fact that redundant applications of only two of the rules are allowed suffices to justify the claim that Gentzen's treatment is ad hoc. In the case of the sequent calculus, it is the use of thinning in the cut-elimination procedure which is disturbing. Neither alternative mentioned in the text is employed consistently; instead both of them are permitted, as well as everything in between—that is to say, using the example in the text, the whole subderivation connected to C may be deleted, selectively pruned, or left entirely intact. Its final form is determined by the position of the inferences from which it is constructed relative to the cut being eliminated. The principle is that, when an inference becomes redundant as a result of a cut-elimination step, it is deleted if it lies above the cut in question and retained if it lies below it. One consequence of this is that derivations which differ from one another in what are usually thought to be insignificant ways ( i e . , when one results from the other by a trivial permutation of inferences) may reduce to entirely different cut-free forms.
Reduction Procedures
105
of differences between reduction procedures. From the present point of view, however, it should not be. We can hardly claim to have understood the significance of reduction if we are unable to decide on general grounds what to do with those parts of a derivation made redundant by the removal of maximal formula occurrences. Furthermore, our decision on the matter should not simply reflect what is convenient given the format of a particular set of rules. (3) A third complication arises from the fact that the removal of an inference may add open assumptions or conclusions to the non-redundant part of a derivation. For the members of D and its various sub-classes, this can only occur in the reduction of maximal occurrences of an implication. There is no need to dwell on this case, however, since complete unanimity exists on how it is to be handled: such a reduction involves the removal of an application a of rule (5) followed by an application (3 of rule (6), so the derivation of the minor premise of /? can be substituted for all the assumption occurrences reopened by the removal of a. (Negation treated as a primitive and governed by rules analogous to -i-left and —>-right provides another example of a connective which requires a reduction step involving this kind of complication.) Notice that, were it not for this feature of the step for implication, reduction would be a trivial matter. This is especially evident for the members of D since all the other reduction steps actually diminish the size of the derivation to which they are applied. (4) The last point I want to raise is connected with the first. Call a collection X of occurrences of a formula in a derivation a premise or conclusion occurrence if its members constitute together one of the premises or conclusions, respectively, of a single application of some rule. If X is a premise occurrence, it is clear that its members need not all belong to the same conclusion occurrence (even if they are all intermediate) which means, in particular, that X may contain both maximal and non-maximal occurrences. The procedure outlined in (1) must, therefore, be supplemented if it is to succeed in removing all maximal formula occurrences from a derivation. For example, suppose X = X\ U X2 where X\ and X2 are conclusion occurrences of different rules, the members of X\ are maximal while those of X2 are not, and X is a premise occurrence of an application a of rule R. Clearly, what is needed is a way to replace a by successive applications a i and 0:2 of R which differ from it only by using the premises X\ and X2, respectively, instead of X. (X\ together with the conclusion of ot\ can then be removed in the manner described in (1) above.) In case R is a one-premise rule, this is accomplished by replacing the derivation in question by one which is almost alike. In general, however, two-premise
106
Normalization, Cut-Elimination and the Theory of Proofs
rules require something more complicated. In any event, the above demonstrates the need to incorporate some method of splitting up inferences into the reduction procedure. This need is another source of differences between reduction in N and reduction in L. In both kinds of system it is met by steps which allow the permutation of inferences. The effect of the permutative conversions in N, however, is to allow a premise occurrence to be split only when it contains some maximal members. On the other hand, there is no such restriction in L, where the fact that cuts may be permuted upwards past any other inference legitimizes every kind of split. As in the case of thinning, these differences are most plausibly explained by reference to the format of the rules and what is convenient for the proof of the normal-form or cutelimination theorem. This state of affairs, however, is no more satisfactory than the corresponding one in (2). We ought to be able to decide on the basis of general principles, and independently of any particular formalism, which inferences may be split (without affecting whatever properties are preserved by reduction) and which, if any, may not.
Let me conclude this discussion with a couple of observations. All the problems associated with establishing a correspondence between cutelimination and normalization derive from the issues raised in (2) and (4) above, i.e., from the need for thinning and permutative reductions. Because these are not treated in an altogether satisfactory manner in the traditional accounts of reduction, it would be desirable to provide a treatment which is in some sense more natural and, in addition, applies equally to both N and L calculi. By itself, this does not seem to be an especially difficult task; what is harder is to accomplish it in such a way that uniqueness of normal forms and the strong normalization property are preserved. I turn now to a description of possible reduction procedures for the members of D. In view of my previous remarks, a necessary preliminary step is to define an operation corresponding to thinning. To this end, I propose to augment rules (l)-(7) of Chapter 4 by the following:
It would seem more natural to formulate these rules with Ak replaced by A{. The resulting class of derivations would not be closed under contractions however. Thinning is essentially a method of combining derivations together, while at the same time dispensing with some assumptions or conclusions. There are at least two ways to represent it. The one I have chosen is to attach the thinned derivation (or formula) to the occurrences of a conclusion. The drawback to this approach is that there is a measure of artificiality in-
Reduction Procedures
107
volved in interpreting the usual thinning rules since these make no reference to a particular assumption or conclusion by means of which the thinned formula is attached to the rest of the derivation. The alternative is to do without attachments of this kind altogether and allow derivations to be disconnected graphs; roughly speaking, derivations will be constructed by taking sets of derivations in the old sense and discharging some of their assumptions or conclusions. Although superficially attractive, this approach is fraught with difficulties. It becomes necessary to redefine such basic notions as likeness, combination and substitution. Substitution, however, is not easily defined for derivations of this kind without sacrificing some of its basic properties. It must either be defined relative to some construction tree of the derivation being substituted into, or distinctions must be made between a variety of derivations all of which are essentially alike. (An example of the kind of undesirable distinction I have in mind is that between {II} and {II, IT}, where II and IT are alike and compatible.) It is technical problems of this kind which have led me to prefer the thinning rules given above. Let DT denote the class of derivations which is like D except that it is generated by the rules (LT) and (RT) in addition to (l)-(7), (9Q) and ( 1 0 Q ) . The definition of substitution is easily extended to the members of DT for, although new cases arise, these fall within the groups specified in the definition. In particular, case (RT) can be treated with Cases 1 and 6, and case (LT) can be treated with Case 4. A similar remark applies to the argument that substitution is well defined: new cases arise, but no new kinds of case. In view of this, I shall not bother to rewrite the definition, but will take it for granted that substitution into a member of DT has been properly defined. Furthermore, the notation introduced earlier in the context of members of D will be used unchanged for the derivations of DTThe rules of Gentzen's N and L calculi can be interpreted in DT without the restrictions required by D. In the cases of NJ and NK, this involves extending the earlier interpretation to cover applications of V- and 3-elimination which do not discharge any assumptions in the derivation of the minor premise (or which do not discharge assumptions in both of them in the case of V-elimination). The idea is simply to use (LT) to add assumptions of the appropriate form when these are lacking. I consider the case of 3-elimination below. It should be obvious how to apply the same technique to the troublesome cases of V-elimination: Using the same notation as before,
n z\ Af IV \
M ))q
^k
t r
corresponds to
A(x)q
108
Normalization,
Cut-Elimination
and the Theory of Proofs
where IIJ is n^ if rii contains occurrences of A(g(x))q as an assumption, and U{ is A[X)g
Cr
otherwise. To interpret LJ and LK, provision must be made for left and right thinning, which I take to be the following pairs of rules: . (°)
A. a. r u A Ai,Bj,r\-A
Bk,T\-A
(°)
r u A p. A. r\-A,Bi,Ai
... r\-A,Bk (v6)' Ai.ri-A.B,-
and 7 (bv>
Bj,T\-A,Ai
Then, A*
n Bfc, T h A AuBj,T\-&
Bk
corresponds to
Bj T
n
A
r n
n r h A, 5fe
corresponds to
^,rhA,B,
Aj
r h A, 5fe
corresponds to
A
Bk
w^-
B,
n Bfc, T h A B.-.ri-AMi
A
r n
n n-A,*,,*
Bk
corresponds to
rs fc ^ n A
When k = j m the above, these rules are obviously equivalent to the familiar thinning rules for the sequent calculus. Their (b) versions do not add to the deductive strength of the calculus, for it is easy to demonstrate that without them h A AihA is a derived rule. In other words, given
n I-A
Reduction Procedures
109
it is always possible to construct IT without using the (b) versions of left or right thinning. The argument is by induction on the number of inference steps which follow the last application of —»-right in II and is routine. 2 As for their resubscripting function, it can be duplicated by the instances r h A
'**
and
**'
r h A
of the (a) versions. (The right-hand side of a sequent can never be empty in this formulation of L.) These (b) rules are not even necessary for cut-elimination to hold. They do, however, make it possible to give a more systematic and, I hope, more rational treatment of thinning in the cut-elimination procedure, and that is why I have chosen to include them. The thinning rules of an L calculus for deriving sequents composed of sequences of unindexed formulae are not hard to interpret in DT* Right thinning is thought of as operating on the right most formula of a sequent, and left thinning on the left most one. I mention this only to emphasize once more that all variants of Gentzen's calculi can be interpreted within this multiple-conclusion framework without the need to tinker with their rules. To substantiate this claim further, let us briefly consider how to interpret -i-left and -i-right in a multiple-conclusion calculus which treats negation as primitive. There are a number of ways to add rules for negation to DT. Perhaps the most convenient is to follow Kneale and allow rules which have zero premises or conclusions.3 Then rule (7) can be replaced by: (7')
An
"Am
and
(8')
—
(Here * is supposed to indicate that (7') has no conclusion and (8') no premises. It can be thought of as an auxiliary symbol whose function is to close certain assumptions or conclusions. It is convenient to stipulate that, when * occurs as a result of applying (7')> this occurrence cannot be part of an application of (8') as well.) Let D'T be the class of derivations generated by this modified set of rules. Corresponding to D'T is a sequent calculus obtained from LK by removing the axioms for _L and adding the usual negation rules. It is easy to see 2 3
"last" here means "having no application of —+-right below or to the left of it." The Development of Logic, page 542.
110
Normalization, Cut-Elimination and the Theory of Proofs
how this calculus can be interpreted in D'T.
r n
n r h A, An r,-AmhA
corresponds to
r , An h A
corresponds to
AAn
-.A
and
r h A, -,Am Everything else is as before. Notice, however, that the (6) version of thinning on the right is necessary in this formulation of the sequent calculus if r h Ai is to be derivable from V K Let A/rj£,T be the class of derivations obtained by interpreting the rules of NJ as instructions for constructing members of DT, and similarly for NKDT1 LJDT and LKDT. It is an easy matter to establish relationships between these classes similar to those described at the end of Chapter 5. Now, however, NJDT
C NKDDT
C LKDT
C
DT
and NJDT
C LJDT
C LKDT
C
DT.
It is not the case that NJ&T — LJryT C NKJJT, because in the TV calculi thinning is only used when it is needed for an application of V- or 3elimination, whereas arbitrary left thinnings are allowed in LJ. I now want to list some reduction steps for the members of DT- In order to simplify matters, let DT be modified so that An in rule (7) is always atomic and A itself is distinct from _L I assume that the corresponding modifications have also been made to both the classical and intuitionistic negation rules in natural deduction, and to the axioms for ± in the sequent calculi. It is well known that such changes do not affect deductive strength. I propose to describe these reductions without first specifying what it means for a derivation to be in normal form. My reason for so doing is that I am unwilling to commit myself in advance to a particular notion of normal form. Clearly, however, a normal derivation should contain no maximal formula occurrences and should possess the subformula property—or something approximating to it. In addition, I would like to give a sufficiently comprehensive and general list of reduction steps so that the familiar reduction procedures can be interpreted in terms of them. This is my primary concern here, and I will not worry about whether all reduction sequences terminate or about the uniqueness of normal forms until later. To state the reductions I must introduce a further piece of notation. Consider for a moment the simplest kind of proper reduction step in any Gentzen calculus. Roughly speaking, it can be translated as follows:
Reduction Procedures
111
If II has some subderivations of the form II' A
Introduction
f(A)
Elimination
then II reduces to the result of replacing these by derivations of the form IT A The problem is to specify what constitutes a subderivation of II, and what it means to replace a subderivation in II by some other figure. In the case of tree derivations the answers to both these questions are familiar and obvious. The matter is a little more complicated here, however. I want to say that III is a subderivation of II2 if the latter can be obtained from the former by applying a series of rules of inference, and that the result of replacing II1 by II3 in II2 is the figure obtained by applying this same series to II3 rather than III. (In fact, I am really describing a special kind of subderivation here, namely one whose assumptions are also assumptions of II2. Call it an initial subderivation. A more general notion is obtained if, in addition to applications of rules, substitution for the assumptions of IIi is allowed.) The step described above cannot always be expressed in the form
n' A
f(A)
reduces to
n'
A A n" U" even if it is simply the translation of a reduction in NJ or LJ. Suppose, for example, that I want to remove a maximal occurrence of f(A) in the derivation of the left minor premise of an application of V-elimination. This will correspond, on the multiple-conclusion interpretation of the rules of NJ, to the removal of one or more occurrences of f{A) from a derivation II* in NJDT- It may not be possible, however, to represent II* in the form shown on the left above, where the occurrences of f(A) displayed in the figure are those to be removed by the reduction. II* might have the form ni
c ( n2 \ A f(A) A
V n3 / E
D
n4 E
112
Normalization, Cut-Elimination and the Theory of Proofs
or be the result of substituting this latter for the assumptions E in some II5, and so on. (Again, it is the failure of generalized associativity, and in particular of condition (4.4), which is responsible for the present difficulty.) Another possibility is that II* results from the figure shown by applying rule (5) to its conclusions of the form E. In this case, there may not even be a II5 such that II* is of the form {[[[U1/C}Il+/A}U3/D}Ui/E}U5 where n + is
n2
A f(A) A. These considerations underlie the following definition. Definition 6.2 (1) a. For n e DT, the subset S(I1) of DT is defined by induction as follows: i. II € S(n). ii. If IT € S(I1) and II" results from II' by applying any one premise rule, then II" € S(II). iii. If III n 2 eDT € S(II) and R then
rii
n2
Bm
Cn
and
n 2 II! c_n^ _ B_ m_ Dp
are both in S(II), where R is any two premise rule, b. S'(II) is defined like S(II) except for the additional clause: iv. If IT G S'(II) and II" G DT has conclusions of the form A%, then \n"/Ai]n' E S'(n). II is said to be a subderivation of II* if II* G S'(II), and an initial subderivation of II* if II* G S(II). (2) a. So(II) is defined like S(II) above except that (i) is replaced by the clause: i ; . If II' G DT, and Ai appears among both the conclusions of II and the open assumptions of II', then [ n / ^ ] l T G So(II). II' G S„+i(n) iff W G So(n /; ) for some II" G S n (II). b. S;0(II) is defined like S 0 (II) except that S'(II) replaces S(II) in its definition. S^ + 1 (n) is defined like S n + i(II) with S^(II) and S'0(U) playing the roles of S n (II) and So(II), respectively. II* is said to have subderivations of the form II if II* G S^(II) for some
Reduction Procedures
113
rc, and to have initial subderivations of the form II if II* G S n (II) for some n. If II* has initial subderivations of the form II, then II* is the result of applying a sequence of operations to II. These can be taken to consist of substitutions and applications of rule (5), since applications of all the other rules can be treated as special cases of substitution. If E abbreviates such a sequence, II* may be written as y. (Notice that, although the latter denotes a derivation, E by itself merely stands for a sequence of operations.) Suppose that II* E S m (II) and that IT is a derivation with the same conIT elusions as II. It is obvious that y the result of applying the operations in E to IT, 4 is a member of S m (II / ) with the same conclusions as II*. If, in addition, the open assumptions of II' are included amongst those of II, IT then those of v will also be included amongst those of II*. I propose to use E, E', E i , . . . to stand for sequences of such operations in general. Using this notation, the reduction step (6.1) above can be written as IT
-Af(A)
II' reduces to
v
E It is just the special case in which E consists of a single substitution. I will write II* as ,y. to indicate that II* has subderivations of the form II. (/.e., the parentheses around E mean that it may contain operations which require substitution for the assumptions of the derivation under construction.) It might be argued that, since the notion of initial subderivation does not correspond to anything very natural when applied to the derivations of N and L, the reduction steps for these calculi could be better formulated in terms of subderivations. The above would then become IT
-Af(A)
n' reduces to
,v.
The former, however, seems more appropriate in the present context. Both 4 W h a t is intended by this phrase should be sufficiently clear for the purposes of the present discussion. See definition 3 below for a proper account of it and of some related notions.
114
Normalization, Cut-Elimination and the Theory of Proofs
formulations are, of course, equivalent because, if
n' A f(A) A (S) it is always possible to find II" and E' such that IT =
n" A
—— n* = f{A) E'
rr n" and
,£. =
E,
(I expand on this remark below.) The differences between the reduction steps for NJ, LJ and LK cannot be expressed simply by placing conditions on E. From this perspective, (proper) reductions in all three calculi are essentially of the same sort. To represent them we must in general allow the derivation on the left to be a member of S n (II + ), for any n, where IT A
n+ = f(A) This kind of reduction step does not seem particularly natural, however, when the rules for generating D? are considered. (The criterion of naturalness employed here is that a single reduction should remove exactly those maximal formula occurrences which constitute the conclusion of a single application of an introduction and the major premise of a single application of an elimination.) The reason is that substitution is not a basic rule of DT- In view of this, it would be more appropriate to restrict the step described above by requiring that the derivation on the left be a member of S ( n + ) (where II + is as above). There is a sense, therefore, in which a proper reduction step in any of Gentzen's calculi corresponds to a series of one or more natural reductions in DTDefinition 6.3
n IT (1) Given _, and II', _ is defined by induction on the length n of E as follows: II IT a. If n = 0, y is just II and y = If.
Reduction Procedures b. If n = m + 1, ^ results from
yn
115
where E' has length m, by
performing one of the following operations: i. Substituting the conclusions Ai of y, for the assumptions Ai of II" (for some A{ and n") In this case y = yn
and
II" if Ai is among the conclusions
L,/Ai
= _,, otherwise,
ii. Applying rule (5) to the conclusions Ai of _,,
n;. In this case y is the result of applying rule (5) to the conIT . n' IT elusions Ai of yf if such there be, and = , otherwise.
n n' IT (2) Given ,y, and II', ,y, is defined in the same way as ~ except for the additional clause: iii. Substituting the conclusions Ai of II" for the assumptions n
A of
IT IT In this case ,„v = [W'/Ai], ,y (3) Given ,~. and II' such that II, II' G LJDT
and LJDT is closed under
IT . IT the operations in E, / v n is defined in the same way as ( , except that (iii) is replaced by the clause: iii'. Substituting the conclusions Ai of II" for the assumptions n
A-of A% 0 t
(E')
n' In this case
/ v l
IT = [ n / ; / i 4 i ] / v n if
IT /v,x
has assumptions
n' of the form Ai, and
/ v l
= [II"/.4 2 ]n* otherwise, where
n' IE'}
IT s
IT = ,
D • (-^j * the- conclusion of
/V/V)
Bj Notice that, even if the open assumptions and conclusions of II' are
116
Normalization, Cut-Elimination and the Theory of Proofs
n' included in those of II,
y
may have open assumptions which are closed
in v . A similar remark applies to the relationship between , , and , v v Notice also that, IT if II = i y
then
IT II = . ,.
for some £'. Before listing the reduction steps, I introduce one last convention. As remarked earlier, a reduction step is taken to have the general form: If II has initial subderivations of the form IIi, then II reduces to the result of replacing IIi by II2 in II. For ease of writing, however, I will simply display the subderivations directly involved in the reduction, so that the above will be written IIi reduces to II2 rather than *
reduces to §
for any E.
The reductions themselves fall into three groups: I. Proper Reductions These remove maximal formula occurrences.
ni
(i)
n2 -Dm
An
ni reduces to
AABP Bq
n2 -Om
An
Bq
and similarly for Aq in place of Bq.
(2)
n
n
An
An
A V Bm
reduces to
V
Q
and similarly for Bn in place of An. (3)
a. II An
n' Bm A->BP W Br
n(A,/n) reduces to
Aq
n,(Br/m) Br
if Aq is among the assumptions of IT. (Notice that, because II and II' have to be compatible, II cannot have conclusions of the form Aq.)
Reduction Procedures b. II An
II' Bm A^BP Br
reduces to W
117
n n' A
BT
R
otherwise. (4)
n A(v')n VvA(v)m A{x)p
reduces to
n'{A(x)p/n) A{x)p
provided that the figure on the right is a derivation with the same open assumptions and conclusions as the one on the left, where II' is obtained from II by substituting x for each occurrence of v' connected to those in conclusions of the form A(v')n.5 (5)
U{A(x)p/n) A{x)p
n
A(x)n 3vA(v)n A{v')p
reduces to
n"
ir provided that the figure on the right is a derivation with the same open assumptions and conclusions as the one on the left, where II" is obtained from IT by substituting x for each occurrence of v' connected to those in assumptions of the form A(v')p. Comment: The restrictions on (4) and (5) are required because DT is not closed under transformations of the kind employed in these steps. More disturbing than this fact is the impossibility of eliminating all maximal occurrences of universal and existential formulae by means of these reductions, even from derivations without quasi-formulae among their (open) assumptions or conclusions. This can be seen from the following rather trivial examples: 5 Connectedness is defined in footnote 5 of Chapter 3. In the presence of (LT) and {RT), it seems best to stipulate that for the purposes of Clause 2 of this definition Bk does not lie immediately below Ai in a configuration of the form
Ai
£*
nor does Ax lie immediately below Bk in Bk B3
A, •
118
Normalization, Cut-Elimination and the Theory of Proofs
A
\/v{A{v)VA{v))n (A{v')\/A{vf))m A{v')p
\fvA{v)r A(b)t
A(v')q
W'A(v")s
(b)t 3vA(v)r A V
( ')P
and
3v"A(v")8 A V
( ')Q
A(v') A A(v')m 3v(A{v) A A{v))n
(where v' = f(VvA{v)))
( w h e r e =
f(3vA(v)))
It seems better to deny that derivations of the above sort have normal forms than to admit as reduction steps the radical transformations necessary to remove their maximal formula occurrences. To prove a normalization theorem for DT, therefore, it becomes necessary to show first that each II € DT (whose assumptions and conclusions consist only of formulae) can be associated with a derivation in some convenient subclass, like LKDT, having the same assumptions and conclusions as II. Unnormalizable derivations involving only maximal universal formula occurrences can conveniently be excluded by restricting (9V) in such a way that the proper variable of the inference does not occur in any other conclusions of the derivation of its premise (or, if it does, that there is no connection between these occurrences and those in the premise itself). Existential formulae are not so easily dealt with, however. There appear to be no convenient restrictions on the rules of DT which will exclude derivations like the right-hand one above. If they are to be excluded, this is best accomplished by incorporating substitution into the rules and rewriting (10 3 ) to mimic 3-left or 3-elimination. II. P e r m u t a t i v e R e d u c t i o n s These allow inferences to be split up. In particular, given an application of an elimination whose major premise consists of maximal and non-maximal occurrences of some formula, it can be divided into a number of applications of the same rule each of which is such that its major premise is entirely composed either of maximal formula occurrences or of non-maximal ones. This is a necessary preliminary to the removal of the maximal occurrences and represents a generalization of the procedure whereby maximal segments are removed using permutative reductions in JV. (1) Let
n2 l l n —- Cyy
n3 if CT is among the conclusions of II2, and let II 2 = n 2 otherwise. Also, let III Tl = Ap
n2
Reduction Procedures
119
and assume that IIj has conclusions of the form Cr • Then
n reduces to
n3
\S
S*-p
n3 ir2'
IIi Bp and n' = cm A - 5 , n2 Also, assume that II2 has neither open assumptions of the form Ak nor conclusions of the form Bp and that Cm i=- A —• Bq. Then
Let
n=
rw
n Cm
reduces to
n2
IT Bp
A-*Bq
{k}
C o m m e n t : The above, taken together, are equivalent to the LK conversion which allows a cut to be permuted upwards past any inference. To translate this same conversion step for LJ (and, a fortiori, the permutative reductions for NJ and NK) (2) is not needed, only the special case of (1) in which II1 is of the form
n* Dq
A.p
n4 where Cr is not among the conclusions of 11* nor Ap among those of II4. In other words, the only permutative reduction needed for LJ is IT JDq
n* Ap
IJq
reduces to
n4 n2 KJf
Ap
/n 2 \cr
/n 4 \ \cr\
IT3 3 Notice incidentally that, when II2 has no conclusions of the form C r , (1) becomes a symmetrical transformation, namely,
ln .
W
IIi Ap
IIi Cr
n 2 n3
reduces to
Cr
Ap.
n3 n2
III. Thinning Reductions These are of two kinds: A. pruning reductions which remove inferences from the derivation of the thinned premise in an application of (LT), and from the derivation below the thinned conclusion in an application of (RT), and
120
Normalization, Cut-Elimination and the Theory of Proofs
B. permutative reductions which allow applications of (LT) and (RT) to exchange places with another inference. Reductions of kind (A) are needed to ensure that normal derivations possess the subformula property. As for those of kind (B), they are intended to take care of any "maximal segments" which may arise because of the presence of the same formula in the premise and conclusion of a thinning. When it comes to the choice of thinning reductions, there is a certain incompatibility between the claims of reason and expediency. In addition, because they are not given much consideration in the usual treatments of normalization, there is little to guide decisions about which to include. The following, therefore, is intended to be a comprehensive list of the various possibilities. I do not mean to suggest that all of these reductions are necessary, or even desirable, nor to rule out the possibility of placing some restrictions on those which are found to be acceptable. A. Pruning Reductions (1) a.
n„
c,
A? Up
Ill
n» l
IIi ^u
A
n
w
s to
Gp
Q
Bk
'_ Cp
CP
cn
Bj
Cp
Ca
B
Z
B™ are where A\x,..., A™n are all the open assumptions of II, and B]x all its conclusions with the exception of Bk> (Here and below p is supposed to be some new index.) b. If II has no open assumptions
II' Ci
n ir
Uv
reduces to Ca
un B 3rn
Bj
Reduction Procedures
121
where B}x,..., Bf^ are all the conclusions of II with the exception of Bk> In the special case where II has only Bk as its conclusion, the figure on the right is to be interpreted as U'(Cq/i). (2)
n Op
n Ci Cq Bk
—n—7*
reduces to
Op
ir
B
C
i,
P
L/p
Cq
BTm
where A\x,..., Afn are all the open assumptions of IT with the exception of Bk, and B} . . . . , BY1 are all its conclusions.6
n(B fc/j )
n
(3) Ai
reduces to
Bj
Bk
Bk
n
(4)
n(sfe/j) reduces to
Bj Bk
Bk
Ai
n
(5)
n n' 11'
reduces3
-^71
tO
\^T
Ck
ck (6)
n An
n ^„
n' reduces to
/l
m
1
n' r
Og
c~k where r occurs nowhere in 11 or 11'. 6 In Dip (the version of DT with rules for -i, rather than axioms for _L) the possibility arises that IT may have no conclusions. In this case, the figure on the right is taken to be
n(c,„) whenever Bk is the only open assumption of IT.
122
Normalization, Cut-Elimination and the Theory of Proofs
Comments: (a) (1), (2), and (3) are sufficient for interpreting the reductions employed in the usual normalization/cut-elimination procedures for the various Gentzen calculi. Unless they are restricted in some way, however, normal forms will not be unique. (b) Notice that (1), (2), and (6) have the property that the figure on the left has the same open assumptions and conclusions as the one on the right. In (5), on the other hand, the figure on the right may have additional open assumptions. B. Thinning Permutations (i)
iii Ai
n2 Bj
reduces to
~W^~
n2 II i Bj
At C~qR
Cd Cd and vice versa, where R is any single-premise, single-conclusion rule. If the step is read from left to right, I assume that the only occurrences of Bk as a conclusion in
ni n 2 Ai
Bj Bk are the ones displayed and q is supposed to be a new index; furthermore, if R is an application of rule (5), I assume that no assumptions in Iii are discharged by it. If the step is read from right to left, I assume that the only occurrences of Cq as a conclusion in
n2 Ei
cq are the ones displayed and k is supposed to be a new index; furthermore, if R is an application of rule (5), I assume that no assumptions in rix become closed by it. (2) a. iii n 2 n2 Ai Bj reduces to IIi Bj
and vice versa, where R is any two conclusion rule. If the step is read from left to right, I assume that the only occurrences of Bk as a conclusion in
ni n 2 Ai
Bj
Bk are the ones displayed and q is supposed to be a new index. If the step
Reduction Procedures
123
is read from right to left, I assume t h a t the only occurrences of Cq as a conclusion in
n2 Bj Cq
Dn
are the ones displayed and k is supposed to be a new index. (2)
b.
iii
n2 Bj Bk
n2 Bi
reduces to R
R
Cq Cn and vice versa. Again, R is any two conclusion rule and the same assumptions about Bk and Cq are made as in (2)a. "m
(3)
a.
^n
Dn
n2 n3
Eh n 2 At
At
reduces to
Bj n 3 Bk Cn R Dm
Iii Bj Cn R Ai Dq Dm
and vice versa. (3)
iii
n2
n 3 At
B3
b. Cn
Bk Dn
n3 n2 reduces to
}
Iii Cn Bj R M Dg
D„
and vice versa. In b o t h cases, R is supposed to be any two premise rule; furthermore, if the step is read from left to right, I assume t h a t the only occurrences of Bk as a conclusion in
ni n2 Ai
BL
Bk are the ones displayed and q is supposed to be a new index. If the step is read from right to left, I assume t h a t the only occurrences of Dq as a conclusion in
n2
n3
Bj
Cn Dn
or in
n3 n2 Cn
Bj Dg
124
Normalization, Cut-Elimination and the Theory of Proofs
in the case of (3)b, are the ones displayed and k is supposed to be a new index.
n
(4) Ak
n reduces to
Bi
R cJ C Bj n
where R is any single-premise, single-conclusion rule, and vice versa. Also, the usual assumptions (i.e., those made in (l)-(3) above, adapted in the obvious way to the present case) about Ak and Cq apply. (5)
a.
n
n
Ai
Ai
Ak Cn
Dr
reduces to
Bi
R
Cn
R
Da Dm
Bj
and vice versa. (5)
b.
n
n
Ai
Ai
Ak
reduces to
Bi ^n R
Urn
3
Dn
&n
Dm
Bj
and vice versa. In each case, k is supposed to be any two conclusion rule, and the usual assumptions apply to Ak and Dq. (6)
a.
M
Cn
Ak^
Bj
^n
Ai
Dm
Bj
reduces to
R
R
~DZ~
and vice versa. (6)
n2 nx
ni
n2
nx n 2
b. ni At C
Ak
n
Bi
reduces to
Dg
0
n
i /
-m
B.
and vice versa. In each case, R is supposed to be any two premise rule, and the usual assumptions apply to Ak and Dq. Some additional permutations are needed for D'T to alter the position of a thinning with respect to applications of the negation rules. The following will suffice for this purpose: (7)
II'
n
->AP ~>An Bk
reduces to
n At
Bk
IT -*An
and vice versa. If the step is read from left to right, the only occurrences
Reduction Procedures
125
of-.A n as a conclusion in
ir -»4» ->^n Bk are the ones displayed and t is supposed to be a new index; if the step is read from right to left, the only conclusions of
n Am At
Bk
having the form At are the ones displayed and n is a new index. (8)
nx n2 iii n Bk ~^Ap reduces to Bk Am n2 ~*An At ~~*Ap * * and vice versa, where similar assumptions apply to -*An and At as in the case of (7). (9) * * ->Ap Am reduces to -*An Aq -"An Bk Am Bk and vice versa, where q is any index if the step is read from left to right, and similarly for p if the step is read from right to left. IT Am
(io)
n Bk
* ~^Ap Am -*An
* reduces to
-iAn
n *
B
A
«
A
and vice versa—p and q as in the case of (9). Comment: These permutations are more than are needed for normalization. To reduce maximal segments to maximal formula occurrences, it would be sufficient to allow them to go only one way, either from left to right or from right to left, and apply only to certain rules. My motive in presenting them as I have done above is to leave open the possibility of equating derivations which differ from one another only by some permutations of thinnings. This concludes the list of possible reduction steps.
7
Correspondence Results I now want to consider briefly the question of a correspondence between the steps described in the preceding chapter and reduction in LK. I will adopt for a moment the terminology according to which the derivations of LK are mapped onto LKpT. Call this mapping (/> and let d, d ' , d i , . . . range over sequent derivations, d >i d' means that d reduces to d' by applying any one of the reduction steps listed in Appendix B, and > is the transitive closure of > i . It can reasonably be claimed that the reduction procedure characterized by > is in essence the familiar one which derives from Gentzen. It consists of three kinds of reduction step: the elimination of a cut one of whose premises is a logical axiom, the replacement of a cut by one of lower degree, and the permutation of a cut upwards past another inference. Unlike the traditional procedure, however, these steps are not applied in any systematic order (as determined, say, by the degree of the cuts or their position in a derivation); if a step can be applied, it may be applied. The other significant difference concerns the treatment of thinning. Ignoring distinctions which cannot be made in the usual formulations of the sequent calculus, it is fair to describe the reductions listed in the appendix as allowing any application of thinning to be permuted upwards. Although not part of the traditional procedure, these are perhaps not such a radical extension of it as they may appear to be, for if d! is obtained from d by such a permutation (or indeed by permuting any other inference upwards), we can find d1? d[ such that <j>(d) — >(di), <j>{d!) — <£(di) and d\ reduces to d'i by an upward permutation of cut. In a sense, therefore, the traditional procedure already sanctions the upward permutation of thinnings. (Once this is granted, my formulation of the steps for reducing the complexity of cuts—according to which any necessary thinnings are applied above the new cut rather than below it—can be seen not to differ significantly from the customary ones.) My aim is to characterize in terms of some subset X of the reduction 126
Correspondence Results
127
steps listed in the previous chapter a relation yx between members of LKDT such that the following hold: Theorem 7.1 IfU yx W, then d >i d! for some d,d' such that >(d) = U and
,11' = ^
, and "II x reduces to Iii"
is an instance of one of the reductions in X. Let X include all the steps in (I) and (II) together with (IIIA1) and (IIIA2). In addition, X should contain the reductions in (IIIB) restricted as follows: (1) If R is an introduction, the step applies only from right to left. (2) If R is —^-elimination, (3) and (6) are replaced by i. (36) and (6a) from left to right.
n.
n'
n
A-+Bm A? ir A —• Bk Cn reduces to Aq cn A —• Bm Br Br where the only occurrences of A —> Bk as a conclusion in n Ap
IT
A-Bm A-+Bk iii.
Cn
are the ones shown, and q is supposed to be a new index. IIi IT Ux n II Cn A —• Bm reduces to Cn Ap W Ap A^Bk Aq A^B, Br Br where the only occurrences of A —• Bk as a conclusion in n;
iii ^n
A —» Bm
A^Bk are the ones shown, and q is supposed to be a new index. (3) If R is any other elimination, the step only applies from left to right. 1 1 (2ii) and (2iii) above can be included in X because, although they themselves are not members of (IHb), they can easily be derived from this group of reductions. ((2ii) is obtained by an application of (6a) from left to right, followed by an application of (6b) from right to left. Similarly, (2iii) is obtained by an application of (3b) from left to right, followed by an application of (3a) from right to left.)
128
Normalization, Cut-Elimination and the Theory of Proofs
(To establish a correspondence between the version of LK with rules for negation and LKJJT , the image of this calculus under >, X must be augmented by (7) and (8) from left to right, and (9) and (10) from right to left.) It is not surprising that (IIIA3-6) are omitted from X. (IIIA3) is a part of the normalization procedure for N, rather than of cut-elimination for £, whereas (IIIA4-6) are best described as experimental steps. The restrictions on the members of (IIIB) are also to be expected. From one point of view thinning permutations are all alike in the sense that, if one is allowed, there is no reason to exclude any other; the members of (IIIB) were selected with this in mind. On the other hand, from the more limited perspective of a cut-elimination or normalization theorem, the criteria for including such reductions are stricter and take into account only what is necessary and convenient, given the format of the rules, for proving the theorem in question. Theorems 7.1 and 7.2 are established by lengthy and mostly routine inductive arguments. I shall merely outline these below, considering only one or two of the more interesting cases in any detail. Proof of Theorem 7. 1: If
ni . £ ^
n; S,
we show by induction on the length n of E that, for some d!', d such that (j){d) = ^
and cj){d') =
^
,d > x d'
Basis Step: n = 0. There are various subcases to consider according to which step transforms IIi into 11^. For reductions from group (I), it should be obvious that the result holds by virtue of the corresponding reductions in group (B). 2 Consider, by way of example, (II): If (j){di) = IIi, and (j){d2) = II2, then the image under <> / of
di
di AnhAn r\-A';Bm An; T'\-A;AABp
rt-A;An
BqVBq A A Bp r- Bq
Aw;r'hA,Bq
r,r'hA,A',£, is
ni
n2
AABP Bq. 2 See Appendix B below for a detailed description of the version of LK assumed here, together with a complete list of reduction steps for this calculus.
Correspondence Results
129
A single application of (Bib) to the former yields d2
rfx ri-AMn
r'hA';Bm An,r\-A',Bq BqhBq An;T'\-A',Bq r,r'hA,A',B,
er (j) whose image under <j>is is
ni
n2 Bq.
In group (II), (1) is taken care of by (C13), and (2) by (C16). Turning to (IIIA), consider first (la): Let <j)(di) = Hi (1 < i < n),
for {*£,...,££}.)
rt _.
d' d T h A'; Ct di A},;.. ;A£ \~A;Bk ft;r'hA',C, h A i ; A\ At;...; A? ; P h A, A', Cq 4;...;A?;r„ri-A1,A,A',C,
dn rnhAn;^.
."
r„,... 1 r l ,r'hA n ,...,A l ,A,A',c, An application of (E4) to the above gives d' d1 T'\-A';C, Tx h A i ; Al Aj;...; A? ; V \- A, A ' , C , . ^;...;i4?B;r1,r'l-A1,A,A',C,
Tn\-An;Al
r^.^ri.ri-An
A^A.A',^
The double horizontal line is meant to indicate a series of thinnings. 3 These can be assumed to have been ordered in such a way that the image under > of the preceding derivation is just the figure on the right in the statement of (IIIAla). (lb) is just a special case of the above. As for (2), it is handled in the same way with (El) being used instead of (E4). It only remains to consider 3
A fuller explanation of this notation can be found in Appendix B.
130
Normalization, Cut-Elimination and the Theory of Proofs
the reductions in (IIIB), but these are routinely handled by their analogues in (F5-16) or in (G). For example, when R is rule (1), (IIIB3a) is treated as follows: Since rule (1) is an introduction, it is sufficient to find d and 6! such that
rii n 2
n2 n3
*4=£jy*. «'Q~i4rL£: BACm
and d>id
'
BACm
So, let Hi = (j)(di) (1 < i < 3) and take d to be d2
^3
r2hA2;^ r3hA3;Cn di r2,r3hA2,A3;gACq r t h Ai; ^ ^ ; T2, T3 h A2, A3, B A Cm r1,r2,r3hA1,A2,A3,5ACm An application of (F14bi) to the above gives d2 T2hA2;Bj
d3
di Ai;r2)r&2;Bk r3hA3;Cn TihAi;^ 4r2,r3hA2,A3,BACm r1,r2,r3hA1,A2,A3,SACm and this latter derivation can serve as d!. The remaining members of (IIIB) are dealt with in a similar manner. This establishes the basis of the induction. Induction Step: n = m + 1. There are only two cases to consider according to whether the last operation in E is substitution or an application of rule (5). Both follow trivially from the induction hypothesis, however. This concludes the proof of Theorem 7.1. d Proof of Theorem 7.2: The proof of Theorem 7.2 is similar, albeit a little more complicated. Given d and d' such that d >i d', we show by induction on the number n of inferences below the cut or thinning to which the reduction is applied that
Correspondence Results
131
use of conversions from (II) and (IIIA). As an example, I consider (B2a) below. The other cases are similar (and usually simpler): So, let d and d! be the figures on the left and right, respectively, in the statement of (B2a) in Appendix B; furthermore, let V be {A\x,..., Al} and A' be {/?/,,..., B£ }. Then
4>(d) =
4>(di) Bi Ay Bk Ap Bq
and
4>(d')
*L_
=
Bs Bs
B^
.Bh
Ba Bq
By £'
7"
where (for any r)
, is the result of substituting the conclusions Bq of 7-
r for the assumptions Bq of 4>{ds), and ^ is the result of substituting the conclusions Ap of r for the assumptions Ap of ^(cfe) and then substituting the conclusions Bq of the derivation thus obtained for the assumptions Bq of 4>{ds). (The usual provisions about subscripts apply to avoid unwanted clashes. This presents no problem since alike derivations are treated as indistinguishable from one another. In particular, I assume that Bq & T U T'.) Now,
Bl
B1 R
=
A q
y
P
Bq
Ap 4>{d2)
Xi <j>(d').
(The first reduction is justified by (12), the second by (IIIA2).) Notice that, in the case of (B2b), an application of (III) would be
132
Normalization, Cut-Elimination and the Theory of Proofs
required before (IIIA2) is applied, followed by a second application afterwards—the inverse of the first—to transform the resulting derivation into 0(d'). C l - 9 . In all these cases, it follows from the properties of substitution that (f)(d) and
n; n 2
B\An reduces to
n
3 Similarly, if it —•-left, take fix to be
n; Bm
B *p>
so that (again assuming that Cr is among the conclusions of n 2 ) (III) becomes:
/nj |
£>TH
V
\ t>
n2
•
A
n; Of
n
)
CT
n3
reduces to
n3
Bm
B-*An Ap
n2 Or
n3
(4) It only remains to consider (CIO), the case in which the last rule See equations (5.4) and (5.5) above.
Correspondence Results
133
on the left is A-right. To simplify the writing I will omit subscripts and assume that the cut-formula occurs in both premises of the application of A-right. So, d and d! can be taken to be T\-A;C;A r'\-A';C;B T, Tf h A, A', AAB;C
d3 C; T" h A"
r,r',r"h
A,A',A",AA#
d% C;T"\-A"
d2 r'hA';C;B
and d\ FhA;C;A
r,r"FA,A";A
c/3 C',T" h A"
r',r"l-A',A";S A,A',A",AAB
r,r',r"h
respectively. Writing IIj for 4>{di) (i = 1,2,3), we must show that
n
c n3
t
n;
n2
A
B AAB
where
ru n = A
n2 iij B and Yl'} = C for j = l,2
n3
AAB Now EI can be written as
[njA] ( [ n 2 / B ] ^ _ | - ) hence by (III) (7.1) 11
C y,
n3
[n'JAjt( v
'^W^AA/0.
But, by (5.4), (7.1) can be written as
[ n i M ] [ n 2 / B ] ^ J - / c n3 which by (5.5) is the same as (7.2)
[n2/5][ni,'^-TAH n3
An application of (III) to (7.2) yields
[U'JBm/A}^^-
n3)
134
Normalization,
Cut-Elimination
and the Theory of Proofs
which is just
ni n'2 A. B_ AAB D l - 5 . In these cases, the fact that 4>(d) =
n'
n
A KB,m
>-i
A A Bm
n3 -
and
n" =
ni AABm
AABS
n3
where n'3' = (A A Bsjm;)n'3,
n'
II
n'2
, Ai Bj , AABm
ni
AABt
n2 ,
AABt
AABS But this is just the instance of (III) in which IIi = II", Ii 2 = A A B^ and n 3 = n^. 5 E. The translation of (1) under (f> is an instance of (IIIA2). Similarly, (4) translates into an instance of (IIIAla) (or (IIIAlb) if T is empty). (3) too translates into an instance of (IIIAl), the only complication being that it is necessary to replace the subscript i whenever Ai G A— but this presents no problem. It only remains to consider (2). Let T = {Al,...,Al}, A' = {B}^...,BfJ and assume A% £ A', then we must show that Aj n Ai Bk y Ai
The term on the left in each equation refers back to the formulation of (III) in Chapter 6. According to this same key, Ap is replaced by A A Bs and Cr by A A Bm.
Correspondence Results where
A1
135
Aj
2±L-
A? U = B 3\
BV But Ai tfdi)
Bk 4>(d2)
^
yx
Ai
Bkl
Bk
<j>(d2)/Ai 4>{di)
by (III), from which
n Ai
Ai_
and
n ' = Cfc
4>(d') Ai
Aj in other words, that (7.3)
£VFTO En F0
En
4>(d)/F0 4>{d')/Ai
Ck
A% Ai
136
Normalization, Cut-Elimination and the Theory of Proofs reduces to 1
EVFni \En F0
En
U/F0
n'
Ck
At
But, by (III), (7.3) reduces to E V F7 En
En 4>{d)IAi
A,
Fa
n'
and this is the same as (7.4), since by (5.4) E\/F„ En F.
En
m/Ai
Ck
A{
EVFm En F0
En
n
F 5 - 1 6 . Roughly speaking, the reductions in this group all translate into instances of (IIIB). There are, however, some minor exceptions and a few complications. Notice first that (j>(d) = <j)(d') in the case of (F9aii) or (F9bii). The remainder of (F5-10) all translate directly into members of (IIIB). When we consider (Fll-16), on the other hand, the situation is complicated by the fact that the active formula in the premise of the thinning being permuted may also occur in the premise(es) of the preceding inference. (Of course, this may happen in (F5-10) as well, but it causes no difficulty in these cases because they deal with left rules. As a result, given d and d' such that d >i d' by one of these reductions, we can find d" such that d" > i d! by the same reduction, 4>(d) = <j>{d") and the active formula in the premise of the thinning being permuted is introduced by the preceding inference.6) If it does, (III) must first be used to reduce cj){d) to a >(d*) where d* is like d except that the final application of thinning has been split into two in such a way that
As an example, consider (F5a). Let d\ and d[ denote the derivations on the left and right, respectively, in the statement of this reduction step, then we can take d" to be {A3/i)d
Aa;(Bj);rh A As;(Bfe),(Cn),rhA R Bt;(gfc),(Cn),rhA
Bfc,cn,rhA It follows from the familiar properties of substitution that (p(d^) =
Correspondence Results
137
So, there are really two kinds of case to consider. Let us take (Flla) as typical of the one, and (F16a) as typical of the other: F l l a . To simplify the writing, assume that Ai is distinct from Bk and C n , then d and d! can be taken to be di
di
_rhA;(5m);A rhA;(Bm);Ai rhA,(£fc),(Cn);^ R and ThA;B, r\-A,(Bk),Cn;At R rhA,5fc,cn rhA,^,cn respectively. Let IT denote
otherwise. Now, (p(d) can be written as
L^)/^]AySm] BkBn C
n
but this either is [Tl'/Ai]
Ai
Bm Bk Cn (if Bm is not among the conclusions of
/ M \n!/Ai]
At_ Cn
(by (IIIB4) from right to left), which is just <j>(d') by (5.4). F16a. We can take d and d' to be di
r;^hA;(Bro) r,Aj\-A;Bm r,Aj\-&,B *,. ~ ,k~,C*n, ~ „
and
r;^KA;(Bm) T;Ai\-A,(Bk),{Cn) r;At\-A,Bk,(Cn) r,^l-A,Bfc,CB
respectively. Also, let IT' be as above, then (j)(d) can be written as Aj m A{
Djn J
Ai
Bn
Hdi)/Bn
Bk
This reduces by (III) to Bm Bk Cnl
Ai
n'
Cn
138
Normalization, Cut-Elimination and the Theory of Proofs whether or not the conclusions of >(di) include Bm. latter derivation reduces immediately to A, At _Ai
Bk
Now, the
I C n / A{ I
ir
by an application of (IIIB5a) from right to left, and this last is just cj){d') (again, by (5.4)). G. These reductions all translate into instances of members of (IIIB). More specifically, the image under 0 of (la) and (2d) is (IIIB5b) (in either direction); (lb) and (2b) translate into instances of (IIIB6a) from left to right, while (lc) and (2c) translate into the same reduction, but from right to left. Finally, (Id) and (2a) translate into instances of (IIIB3b) (in either direction). This completes the basis step of the induction.
Induction Step: n = m + 1. Suppose that the result holds when the reduction is applied to any inference with no more than m steps below it. Let d >i d', and assume that the number of steps below the inference I being reduced is m + 1. We must show that 0(d) y 0(d'). Let d\ be the immediate subderivation of d which contains J , and d[ be the result of applying the appropriate reduction to J , then (j)(d\) y
0(di) Aj B->Ak
0(<*i) and {n}
Aj B-+Ak
{n}
respectively, in the case of —>-right), for some Ai and II. Hence, it follows immediately from the induction hypothesis that 4>(d) y
Correspondence Results
139
Lemma 7.4 If Hi >-\ H2, n is any member of LKr)T and Ai is any conclusion ofH, then [U/Ai]Ui >-i [ n / ^ ] r i 2 (by an application of the same reduction step). Proof of Lemma 7.4
ft ft Since II1 >-i II2, III can be written as * and II2 as 2 , for some E, where "III reduces to n' 2 " is an instance of one of the steps listed above. The lemma is proved by induction on /(E), the length of E. Basis step: /(E) = 0 In this case it is obvious that "[n/i4»]IIi reduces to [ n / ^ J I V is an instance of the same reduction step. (Strictly speaking, this has to be verified in the case of each reduction step with the aid of the definition of substitution, but it is a trivial matter to do so.) Induction Step: /(E) = n + 1 Suppose that the lemma holds for all E" such that /(E") = n, and let E' denote the first n terms of E. There are two cases to consider: i. The last term of E is an application of rule (5). Now, §
yx "'?, hence [U/A^]
^
[U/A^
by induction hypothesis. But it follows from the definition of substitution that, if II has no open assumptions of the form Cn or conclusions of the form Dm, for any II':
[n/Ai]W Dm C-+Dp
"' D \C^Dpm
VIM ( {n}
w,
So, in particular,
Dm C^DP
? 0-
[n/At] {n}
ii. The last term of E is an application of substitution—for the assumptions Bk of some II', say. Again, it follows from the induction hypothesis that [U/A^]
M
[Il/A^
Furthermore, by (5.4), if Ai is not among the assumptions of II' nor Bk among the conclusions of II,
[ n / ^ y ^ j i r = [n/^iQ^/sfcjn')
140
Normalization, Cut-Elimination and the Theory of Proofs IT (j = 1,2). But the term on the right is just [ I I / ^ ] ^ , hence [U/A^
yt
[U/A^
This completes the proof of the lemma and of Theorem 7.2.7
•
Whether all the reduction steps listed in the preceding chapter are considered, only that subset of them which corresponds to reduction in LK, or the steps for LK listed in Appendix B, it should be apparent that normal forms are not unique nor does every reduction sequence terminate. The latter feature can be attributed to two factors. The first is that some reduction steps are symmetrical. These comprise various instances of (III) as well as the thinning permutations in (IIIB); in the case of LK, they are basically those permutations which involve only cuts and thinnings, and do not result in the splitting up of an inference. The reduction sequences generated by steps of this kind can contain only a finite number of distinct terms—although some of them may be repeated infinitely many times. For this reason, they seem not to pose a serious problem. The same cannot be said of the second factor, which is that certain reductions, when applied to a derivation, may yield a more complicated one. Here again I am thinking of (III) or, in the case of LK, those steps which allow a cut to be split up, namely, (Cl-4), (C10-13), (Dl-5) and (D6-9). 8 It was this phenomenon which was exploited by Zucker to produce an infinite non-repeating reduc7
Using this lemma, it is easy to show (again by induction on / ( £ ) ) that, if n = 5A, II' = 5 ? and "III reduces to n 2 "
is an instance of one of the reduction steps, then n >-i II 7 —thus substantiating the claim made earlier that nothing is lost by formulating the reduction steps in terms of initial subderivations. 8 A11 I intend here is to draw attention to the fact that some reduction steps enable us to generate infinite non-repeating sequences. As a matter of fact, I am hard put to explain what it means for one derivation to be more complicated than another in the present context. A necessary condition seems to be that the former should contain more inferences or vertices than the latter, but this is not sufficient. For example, although the application of (I3a) may increase the size of a derivation, it seems inappropriate to assert that it also increases the complexity—because the strong normalization theorem holds (for NJ), if for no other reason. Yet it is notoriously difficult to specify what kind of simplification is accomplished by this step. (A way to do so, and hence to define a measure of complexity which decreased with each application, would yield a simple and direct proof of strong normalization.) In the case of LK, a thinning permutation such as ( F i d ) provides an example of a reduction which, although it may increase the number of inferences, seems to simplify rather than complicate a derivation. The situation here is further obscured by the fact that applications of at least two different reduction steps are needed to generate sequences of the kind described above. It is perhaps unreasonable, therefore, to claim that any particular step is by itself responsible for an increase in complexity.
Correspondence Results
141
tion sequence for LJ. His example can easily be adapted to the case of LKDT, as the following shows. To simplify the notation, I shall not write in subscripts; clearly, nothing is lost by this omission. Recall that by (5.5)
ni n 2 A B
n
may be used to denote ambiguously
n2
ri!
[Ui/A] B and [U2/B] A
n
n
(provided, of course, that II2 has no assumptions of the form A and II1 has none of the form B). Now, suppose that ITi and II2 are of the forms
ir
n"
CD
A
ni n2 A
diiu
F
ni' n2' B
A
respectively. Then,
B
n'
nx n2 A
E
C
B
n
D
A
n2 n2 B\
=
n; rr2
^1
A
B2
rr
n+ n+ (by (III) and (5.5)) where
n'
n2
C
n+ = B n
and
IT
=
D
ni ir2 A
Bx
B2
A
n
n
As is apparent,
rii n2 A
B
n2 n2 and
n
Bx
n*
B2
have the same form and II is a proper subderivation of II*. (III) can therefore be applied once more to obtain a figure of the form
n" n2
n2
#2i
B22
E where
11^ =
ni' B\
n2' B2i
n*
B\
B22
n*
142
Normalization, Cut-Elimination and the Theory of Proofs Clearly 11* properly includes II*. Furthermore, (III) can now be applied
to
n2 n 2 #2i
&22
n* and so on ad infinitum. With each application, the derivation which results is of increased size. (The use of the subscripts 1, 2, 2i and 2 2 on B is, strictly speaking, an abuse of notation. They are only intended as an informal device to make matters clearer by keeping track of various occurrences of B between figures.) As I remarked in Chapter 2 above, this sort of example cannot be carried out in the sequent calculus except under special circumstances. In particular, it does not apply to the version of LK presented here. The difference between LKDT and LK in this respect is accounted for by the fact that, if d is obtained from d' by a reduction step in (Cl-9) or (Dl-5) (i.e., by permuting a cut with the last inference in the derivation of its righthand premise, or by splitting the cut-formula in the right-hand premise of a cut), (j>(d) = <j>(df). As a result, the translation of a reduction sequence from LKDT back into LK may involve applying these steps from right to left, as well as from left to right. This is the case in the above example. It is easy to check that an infinite non-repeating reduction sequence analogous to the one described above can be generated in LK if the reduction procedure is augmented in this way. The conclusion to be drawn from all this, however, is not that there can be no infinite non-repeating reduction sequences in LK. I present an example of one below. Notice that it depends essentially upon allowing more than one formula to appear on the right-hand side of a sequent. In this respect, it differs significantly from the preceding example, which applies to LJDT no less than to LKDT. Again, I shall omit subscripts and all parts of the derivation (e.g., side-formulae) which do not affect the situation. Reading from top to bottom, each one of the figures below reduces to its successor by an application of (C4) or (C13). (7.5)
CY- B
A h C\ B
B hE
A\-B
B-E\~D BY- D
Ah D (7.6) B\-E
AV-C\B
B;E\-D B\- D A\-C;D
C\- B C\~ D AhD
BhE B;EhD BY- D
Correspondence Results
143
(7.7) AhC;B B\- E AhC;E
A\-C;B B;EVD A;E\-C;D A\-C,D Ah D
BhE CVB C\- D
B,E\-D BhD
(7.8) A\-C,B BhE A\-E\C
A\-C;B B \-E C h B B \- E C h B B\-E A-S\-C\D CVe C;£\-V A\-C-,D C\-V AV-V Now, it is clear that any reduction step which applies to (7.5) can be applied with the same result to the part of (7.8) which is written in calligraphic characters. So, this sequence of steps can be repeated, beginning this time with (7.8), to obtain a larger derivation, and so on ad infinitum. Other similar examples of infinite non-repeating sequences can be constructed, but the above is as simple as any. I turn now to the issue of uniqueness of normal forms. The above considerations are already sufficient to rule out the possibility of each derivation having a unique normal or cut-free form. (For the purposes of the present discussion, I will regard as normal any derivation of LKDT which has the subformula property, and lacks maximal segments and formula occurrences.) In the first place, some of the symmetrical reduction steps, most notably the thinning permutations, apply to derivations which may already be normal or cut-free. By itself, this is perhaps not so disturbing since it implies only that a derivation may reduce to a finite number of normal ones and that these are all reducible to one another by means of such permutations. In fact, however, a derivation may have infinitely many distinct normal forms. This is obvious in the case of LKr>T, since each derivation in the above example of an infinite reduction sequence may be normal, and it is also true for LK. (To see this, suppose that AY- C\B, C h B, B h E and B\ E h D all have cut-free derivations and that C, B and E are atomic. Then, it is an easy matter to specify a reduction procedure, namely always eliminate the left-most cut with no cuts above it, which will yield a distinct cut-free form for each term of (7.5)-(7.8).) In addition to the problems caused by the failure of strong normalization, there is a further difficulty which stands in the way of uniqueness. In LK it has to do with the manner in which cuts are to be eliminated, and in LKDT it concerns the pruning of redundant parts of a derivation {i.e., those attached by thinning). Notice first that the proper reduction steps for LKDT a r e conservative: each application of any one of them removes exactly two inferences from a derivation (an introduction together with the elimination following it). Those which remove maximal formula occur-
144
Normalization, Cut-Elimination and the Theory of Proofs
rences whose principal connective is A, V or —• differ from the conventional reductions in this respect. (This holds for —• only when the occurrences are introduced by an application of rule (5) which discharges no assumptions.) The usual procedure is to remove some or all of the subderivation which culminates in [or derives from] the redundant premise [or conclusion] of the inferences eliminated by the reduction—whenever such redundancies occur. How much of this subderivation is to be removed depends upon considerations which vary from calculus to calculus. Now, as an illustration of the difficulty which presents itself, consider a derivation of the form (7.9)
n' A CV A C A
n n" A B AAB A
where II, n ' and n " are assumed to be normal, Applying (II) and (12) in any order yields (7.10) ir
C
A
n n" A
B A The problem is that, although this latter derivation contains no maximal formula occurrences, it may not be normal, i.e., it may lack the subformula property. A solution is provided by the pruning reductions in (IIIA). Unfortunately, however, when used without restriction, they lead to distinct normal forms. In particular, they can be used to convert (7.10) into . or IT into .. Before discussing the possibility of restricting them in some way, it is worthwhile to consider how this matter is handled by the reduction procedures for NJ and LK. In NJ (7.9) cannot be represented by a derivation whose last inference is A-elimination, rather it corresponds to one of the form [C]
rr
A CvA
n n" A B AAB A
[A]
Correspondence Results and this reduces to
145
. no matter which maximal formula occurrence is
operated upon first. In LK, (7.9) corresponds to a number of different sequent derivations, among them the following: (7.11)
d \-A
d' \- A \-CVA
d" C\- B A\- A C\JA\-B;A
\-B;A \- AAB;A
A\- A AAB\-A \-A
and (7.12)
df \- A \-CVA
d d" \- A CY- B A\- A C\-AAB AAB\-A C\-A CV AY- A h A
A\- A
(d, d' and d" are supposed to be such that (j)(d) = II, (j){d') = 11' and cj){d") = II". Also, I have not bothered to write in any side-formulae.) No matter how the cuts are eliminated from (7.11) and (7.12), the result is that the former reduces to d and the latter to d!. This would be perfectly satisfactory were it not for the fact that the following derivation can easily be seen to reduce to both (7.11) and (7.12): d' \- A \-CvA
d" C\-B AY- A CVA\-B;A h B\A
d \- A BY- B A\- A B \-A A B A A B Y-A BY- A \- A
The problem is that when a cut is eliminated or reduced in complexity, its location in the derivation affects the result of reducing it. Unfortunately, however, this location is not in general uniquely determined—except by adding ad hoc restrictions. (Another illustration of the difficulty this causes is provided by the procedure for eliminating a cut, one of whose premises is introduced by thinning.) The uniqueness of normal forms in natural deduction can be explained, as far as the negative fragment is concerned, by two features of this calculus. The first is that there is a natural ordering of the inferences which constitute an N derivation, and this ordering is not affected by any of the reduction steps. The second is the fact that any branches made redundant by the application of a reduction are composed entirely of inferences
146
Normalization, Cut-Elimination and the Theory of Proofs
subordinate (in the sense of this ordering) to the ones being removed. In view of this, they may be pruned in their entirety without spoiling uniqueness. When we turn to the full calculus, it is no longer obvious that the ordering of inferences is entirely natural. In addition, it becomes necessary to allow reductions which alter this ordering. To preserve uniqueness, ad hoc restrictions must be placed on these. By itself, however, this is not sufficient because the branches made redundant in the process of normalization may contain inferences which are not subordinate. So, to ensure that the calculus possesses the second feature mentioned above, the meaning of "redundant" is altered by treating inessential applications of V- and 3-elimination as though they were essential.9 In NJpT and NKQT, this last translates into some ad hoc conventions concerning when the rule (LT) can be applied and how much of a redundant subderivation is to be pruned. In light of the preceding considerations, we can understand better why normal forms are not unique in LK and LKDT. TO begin with, there is in general no satisfactory way to order the inferences of a multiple-conclusion derivation. This can be seen from the example of (7.9), which can be interpreted as having been constructed either from II and IT A CV A C A
n" by applications of A-introduction and elimination, or from C
IT A CV A
n n"
B_ AAB A and A by V-elimination. Although there are a number of other similar examples, I will present only one more: 9
A
T h e branches of a derivation made redundant by a reduction are those which, if they are to be retained after the removal of some maximal formula occurrences, must be reattached by means of thinning. I argued earlier that applications of V-elimination which do not discharge assumptions in both minor premises and applications of 3-elimination which discharge no assumptions both involve the tacit use of thinning. If this interpretation is kept in mind and reduction in the N calculi is taken to consist of removing maximal formula occurrences together with the branches made redundant by their removal, then it can fairly be claimed that what it means for a branch to be made redundant in the negative fragment is not the same as in the full calculus.
Correspondence Results
147
Given ~ ^, let
rii n
n n4 n2 = E
F G
and
n 3 = j4 B_ C
Then iii _^4
n2 n3 n4 B_ = ]Z F_ C G
(provided that E is not among the conclusions of Iii, nor B among those of n 4 ). Of course, the inferences of an LK derivation can be ordered. The problem is that the ordering is rather artificial and can be changed radically— most notably by the upward permutation of cuts—in the course of cutelimination. In LKDT too, even when the order of inferences is determined by the structure of the derivation, it can often be reversed by applying (III). As a result, any reduction procedure (for LK or LKoT) which allows entire branches to be pruned as they become redundant will not yield unique cut-free or normal forms. It is however hard to envision natural reduction steps which prune enough to ensure the subformula property, but not so much as to destroy uniqueness. Certainly those reduction steps for LK which involve pruning, namely (Bl-3), do not fit this description. They correspond in LKpT to (11-3), respectively, followed by an application of (IIIAl) (in the cases of (II) and (I3b)) or (IIIA2) (in the case of (12)). In effect, they allow a segment of variable length to be removed from each redundant branch—the only constraint being that no open assumptions or conclusions are to be lost. It is obvious that this is too drastic for uniqueness. Furthermore, because the extent to which a branch is pruned depends upon the position of the inferences used to construct it relative to the cut whose complexity has been reduced (those above it being removed, while all others are retained), relatively insignificant permutations of inferences will alter significantly the effect of these reductions, and it is by no means clear that they have any claim to be called natural. If we contemplate replacing (IIIA1-2) by pruning reductions which are more systematic and will preserve uniqueness, there seems to be only one reasonable possibility: restrict (IIIAla) to the case in which II consists of a single introduction and, dually, (IIIB2) to the case in which II' consists of a single elimination. The drawback to this idea is that, if the connective being introduced is —•, II will not fit into the format of (IIIAla). To take account of this, (IIIAlb) must be replaced by (IIIA5). Now, if the somewhat trivial step (IIIA6). is also included, we have a group of pruning reductions which will ensure that normal derivations possess the subfor-
148
Normalization, Cut-Elimination and the Theory of Proofs
mula property. (In fact, _L may occur in a normal derivation even though it is not a subformula of any assumption or conclusion. For all practical purposes, however, the claim is true.) Furthermore, because no maximal formula occurrences are removed by these reductions, they do not by themselves threaten uniqueness and leave open the possibility of proving that the normal forms of a derivation are all equivalent in some suitable sense of the word. What vitiates this approach, of course, is the fact that (IIIA5) may introduce new open assumptions into the derivation on which it operates, and hence is unsuited to be a reduction step. The disappointing conclusion, therefore, appears to be that the only natural way to prune a normal form of an LKDT derivation II in such a way that it will have the subformula property and bear some structural resemblance to the other normal forms of II leads to an insuperable difficulty. The preceding remarks notwithstanding, by choosing an appropriate notion of normal form and placing sufficient restrictions on the reduction steps, it is clearly possible to prove for any of the calculi under consideration not only the uniqueness (up to some equivalence) of normal forms but also the termination of each reduction sequence in a normal derivation with the subformula property. The problem is that this will be an ad hoc procedure designed expressly for the purpose of obtaining these results. On the other hand, unless we are prepared to interpret 'natural' as 'natural relative to the rules of a particular system', it is not clear that there is any such thing as a natural reduction procedure. Despite a fundamental similarity between reduction in all the systems discussed above, we are faced with a bewildering number of choices about matters of detail which are decided for each particular calculus in what appears to be a reasonable way only by respecting its combinatorial peculiarities. As a result, these decisions often seem pointless and arbitrary when translated from one calculus into another. Furthermore, it is upon these apparently trivial decisions that the possibility of proving strong normalization and Church-Rosser type theorems depends. When we do come across a calculus like NJ for which such theorems hold with respect to a relatively straightforward set of reduction steps, it seems to be more a matter of combinatorial accident than a reflection of some profound truth about normalization. For this reason, it seems unwise to use (as Zucker appears to do) the normalization procedure for NJ as a kind of benchmark by which to judge other reduction procedures. Once we go beyond the negative fragment, no method of reduction stands out as privileged; they all appear to be more or less satisfactory compromises between competing requirements. An investigation of their formal properties does not provide sufficient grounds for choosing between them or assessing their wider significance. So, rather than pursuing such an investigation further in the hope of discovering some clue as to how the relationship between a derivation and its normal form(s) is to be interpreted, it might prove more fruitful to
Correspondence Results
149
consider directly various interpretations which have been suggested for the derivations of a formal system with a view to drawing up a set of criteria, independent of the rules of any particular formalism, by which to judge reduction procedures and proposals regarding their significance. Before turning to this task in the next chapter, I would like to conclude the present one with a brief discussion of a topic which, although peripheral to my main theme, is nevertheless of some interest—namely, the advantages of presenting classical logic in a multiple-conclusion framework. As I observed earlier, the rules of DT are all classically valid so that, for example, the sequent T h A is derivable in LK iff there is a derivation II G DT of A from T. Some of them, however, are not intuitionistically valid. In particular, both (5) and (10 v ) need to be restricted to obtain intuitionistic logic. It is rather a complicated matter to formulate such restrictions if the rules are to be applied downwards in a straightforward manner, although there are quite naturally generated subsets of DT which are adequate for intuitionistic logic—the most conspicuous example of one being LJDT> Nevertheless, it seems fair to say that these multiple-conclusion rules express more naturally the classical interpretation of the logical connectives than its intuitionistic counterpart. The situation is reversed in the single-conclusion case. It has often been remarked, for example, that NJ is extended to classical logic at the cost of a certain artificiality, or that NK is perhaps not "the proper way of analyzing classical inferences."10 Furthermore, NKDT can only be described as an arbitrary subset of DT- The claim I wish to defend here is that D? (or some variant of it) is the proper generalization of NJ to classical logic and that it is superior to NK as a natural system of classical deduction. In the first place, the rules of classical multiple-conclusion logic can be formulated in a completely explicit way without restrictions. (This was observed by Kneale and seems to have motivated, in part at least, his interest in multiple-conclusion derivations.) The only rule which does not conform to this description is (5) and, following Kneale, it could be replaced by:
< 5 '*
A-^TB
«"«
<5">
A^B
This is surely as it should be since we have been taught that classical inference depends only on the truth or falsity of statements, not on the manner in which they are established. (There is an additional advantage in doing away with improper inference rules if one believes that the meanings of the logical connectives are defined by their associated rules: such definitions will now all be explicit ones.) A second point to consider is that (with some minor qualifications) every 10 See Gentzen's "Investigations into Logical Deduction," or Prawitz's "Ideas and Results in Proof Theory." The quotation is taken from the latter.
150
Normalization, Cut-Elimination and the Theory of Proofs
derivation in DT can be reduced to a normal form having the subformula property. This is in contrast to NK for which a theorem of this kind holds only in the negative fragment.11 It is perhaps worth noting, however, that a similar result can be proved for a single-conclusion variant of NK—or at least for its propositional part. I pointed out earlier that, if the following rule is substituted for the law of excluded middle or the classical negation rule in NK, the resulting system of rules will, under the appropriate interpretation, generate a more natural subset of DT'. (7.13)
[X]
n c_
[-iX]
c
ir c_
Let the calculus obtained by adding this rule to NJ, be called NK'. As far as I know, NK' has not been much studied. 12 It is obvious that NK' is adequate for classical logic (since, for example, the law of excluded middle is trivially derivable using the above rule). Furthermore, a strong normalization theorem is provable for the propositional part of NK'. This is because nothing is lost (as far as propositional inferences are concerned) by requiring X to be atomic in the above. I show that this is so below. The four figures which follow can be interpreted either as the cases in an inductive argument to show that NK1 is deductively equivalent to the system obtained by restricting the above rule to atomic X, or as a set of negation reductions according to which (7.13) reduces to the figure shown when X has the appropriate form. (1)
X =
AAB [AAB] A 1 i(i4AS) IT C_
hA] [A] [B] AAB n C_
hB]
[AAB] B_
± -i(;4 A B) IT
c_
c__ c
11
It was Smiley and Shoesmith who first observed this advantage of multiple-conclusion natural deduction and, in Chapter 20 of their Multiple-Conclusion Logic, proved a normal form theorem for the classical predicate calculus which made use of it. 12 Sundholm however mentions it in his article, "Systems of Deduction," where it is called the rule of non-constructive dilemma. (See Vol. I of the Handbook of Philosophical Logic edited by F. Giinthner and D. Gabbay, Dordrecht, 1983.)
Correspondence Results
(2)
X = AVB [A] l^A] [B] [-B] ± _L 1 ->{AVB) W C
[AVB] \A) AVB n C C
[B] AV B
n c (3)
X =
c
A-+B
[4
[A^B] B
[-,B] [A]1 [-^4] 1 J. ^(A - B) B ir ~A[^ B
[B] A^B
n c (4)
151
c
c
n c
c
X = ^A [A] hA] ->->A
TV
c
[->A]
n c
An additional permutative reduction is needed to take account of maximal segments produced by (7.13), but this creates no problem and the usual proof of normalization goes through virtually unchanged. 13 The attempt to extend the above to quantifiers founders on the following difficulty: the analogues of (l)-(4) do not hold for the usual introduction and elimination rules for V and 3; on the other hand, they do hold trivially with respect to the rules (9Q) and (10g). These latter rules, however, pose certain problems for normalizability, so it is by no means clear that there is a reasonable formulation of the quantifier rules with respect to which a normalization theorem will hold.] My final reason for maintaining the superiority of DT is in some ways the most important: just as Gentzen claimed that the derivations of NJ had a "close affinity to actual reasoning," so I claim that classically valid principles can be derived in rather a natural, concise and straightforward 13 There also appears to be no difficulty in extending the "convertibility" proof of strong normalization to the present context, but I have not checked this in detail.
152
Normalization, Cut-Elimination and the Theory of Proofs
way using the rules of DT- TO illustrate this fact, I present two sample derivations, the first of (A —» B) V (B —> A) and the second of {A - ^ ( B V C)) -* ({A - • B) V (A - • C)). For the sake of legibility I will not bother to write in subscripts or identify the rules used to justify each step: [B] A-*B (A-*B)vA B (A-> # ) V ( B - » A) B (4 - • B) V ( £ -> A)
[A]*
[A -> (B V Q] * BVC
B A^B*
C A^C ( A - •B)V(A-+ C) ( A -• B) V (A -» C) ( B V C ) ) ( ( A B V v (A - C)) t (A (* and f indicate which assumptions are discharged by which inference.) These derivations compare very favorably in terms of length and intelligibility with their NK counterparts, as do many other DT derivations. (The relationships between universal and existential quantifiers, for example, can be established in just a few lines in DT) It seems fair to conclude from all this that multiple-conclusion logic has more than mere curiosity value.
8
Interpretations of Derivations If the analogy between the derivations of a logical calculus and the terms of a calculus of functions is taken seriously, it leads naturally to the idea that interreducible derivations represent the same proof.1 This is so not only because it suggests the possibility of a strong normalization theorem for derivations (which, as Prawitz has pointed out, gives a certain coherence to the idea) but also because reduction in term calculi has traditionally been used to analyze the equality relation between terms and, by extension, the identity relation between the objects which they denote. Church, for example, states quite explicitly that the substitution of interreducible terms for one another in an expression leaves its meaning unchanged. This implies at least that such terms must have the same denotation under any interpretation. He also points out, however, that the notion of difference in meaning is a vague one, that there is a range of distinct identity criteria for functions and that it is not always possible to distinguish between these by means of the reductions he considers.2 Prawitz seems to have been the first author to suggest in print that the identity relation on proofs could be characterized in terms of reductions between derivations, although the idea that there is a connection between proofs and functions, or a formal analogy between the derivations of a logical calculus and the terms of a calculus of functions, goes back quite a long way. Godel, for example, observed that the concepts of computable function of finite type and intuitionistically correct proof may be used interchangeably in certain contexts. 3 Furthermore, Curry and Feys noted a striking analogy between the theory of functionality and the positive implicational calculus, which they 1
See, for example, page 249 of Prawitz's paper "Philosophical Aspects of Proof Theory" in Volume I of Contemporary Philosophy (ed. by G. Floistad, The Hague, 1981). 2 The Calculus of A-Conversion, Princeton, 1941. See page 15 and pp. 2-3. 3 "Uber eine bisher noch nicht beniitzte Erweiterung des finiten Standpunktes," Dialectica, Vol. 12, 1958, p. 283; reprinted with translation in Vol. II of his Collected Works (ed. by S. Feferman et ai, Oxford, 1990). 153
154
Normalization, Cut-Elimination and the Theory of Proofs
exploited to prove a normal-form theorem for combinators. 4 Another connection was made by Tait. 5 He adapted a method used to prove cutelimination for derivations involving induction principles to analyze the computation of functionals involving definition by recursion (i.e., to prove a normal-form theorem for a certain set of terms). Howard, building on the ideas of Curry and Tait, extended the analogy from the positive implicational calculus to Heyting arithmetic and indicated how to establish a normalization theorem for the associated calculus of terms. 6 The method he suggests for proving such a theorem is due to Tait; it involves the notion of a convertible term, which was introduced by the latter to analyze the computable functionals of finite type, in the sense of Godel, as well as a certain extension of them. 7 This method was also used to good effect by various contributors to the Proceedings of the Second Scandinavian Logic Symposium, in particular, by Girard, Martin-Lof and Prawitz, all of whom adapt it to prove normalization theorems for the derivations of a variety of systems. 8 Prawitz argues that the derivations of Gentzen's N calculi represent first-order proofs (and, conversely, that each such proof can be represented by an N derivation), and then goes on to conjecture that two derivations represent the same proof if and only if they are interreducible. There are a number of points to notice about this conjecture. The first has to do with what it means for a proof to be represented by a derivation. It is quite clear that for Prawitz proofs are abstract objects whose relationship to their representations is analogous to that between propositions and the sentences which express them. It follows that to ask when two derivations represent the same proof is much like asking when two sentences have the same meaning. 9 Whether this is an appropriate analogy and how it is related to the one mentioned in the preceding paragraph will be considered later. For the present, I want to draw attention to a second point about the conjecture: its tentative nature. Of course, it is advanced only as a possible answer to the question of when two derivations represent the same proof, and no conclusive evidence if offered in support of it. There is, however, 4
Combinatory Logic,Volume I, Amsterdam, 1958, page 312ff. "Infinitely Long Terms of Transfinite Type," Formal Systems and Recursive Functions, 1965, page 177. 6 "The Formulae-As-Types Notion of Construction"; this has circulated as a manuscript since 1969, but was first published in the volume of essays To H.B. Curry, ed. by Seldin and Hindley, New York, 1980. 7 "Intensional Interpretations of Functionals of Finite Type I," Journal of Symbolic Logic, Vol. 32, 1967, pp. 198-212. 8 Prawitz's contribution to this volume forms the basis for my discussion of his views in the text. The conjecture about the identity of proofs is to be found on page 257. He states on page 261 that it "is due to Martin-Lof and is also influenced by similar ideas in connection with terms" which are to be found in Tait's 1967 paper referred to above. 9 See, for example, page 237 of "Ideas and Results in Proof Theory"—or "On the Idea of a General Proof Theory," Synthese, Volume 27, page 68. 5
Interpretations of Derivations
155
another sense in which it may be considered tentative: even if it should turn out that identity of proofs can be characterized in terms of reducibility, Nevertheless, the conjecture as stated above is clearly in need of certain refinements. Firstly, derivations that only differ with respect to proper parameters should obviously be counted as equivalent. Secondly, one may ask whether not also the expansion operations preserve the identity of the proofs represented. It seems unlikely that any interesting property of proofs is sensitive to differences created by an expansion. 10 These are just examples of matters of detail which need to be settled before the conjecture can be put in a definitive form. Two further ones are provided by questions concerning immediate simplifications and the permutative reductions. Immediate simplifications allow the removal of redundant applications of V- and 3-elimination as well as of the classical negation rule ; such applications obviously "constitute unnecessary complications" in a derivation, but their unrestricted removal destroys the uniqueness of normal forms.11 As for the permutative reductions, there seems to be no particular reason—apart from expediency—for restricting their application to cases where it results in diminishing the length of a maximal segment. If, for example, the major premise of an elimination rule can always be permuted upwards past an application of V- or 3-elimination, this will not affect the strong normalization theorem. 12 If, however, no restriction is placed on these reductions, not every reduction sequence will terminate (as Zucker's example shows). What is at stake in all these cases is the exact definition of the reduction relation. Even after stipulating that interreducibility is to be, roughly speaking, the equivalence relation generated by an agreed upon set of proper reductions, we are still obliged to decide a number of rather picayune questions before we can fix it precisely. Furthermore, if the claim that this relation has significance outside the confines of a formal system is to have any credibility, we must exhibit sound reasons for our decisions. It is, however, difficult even to imagine what kind of evidence would settle conclusively issues as small as these. This leads me to my final point about the conjecture, namely, the evidence offered to support it. To substantiate the claim that derivations which reduce to the same normal form represent the same proof, Prawitz appeals to what he calls the inversion principle: "a proof of the conclusion of an elimination is already 'contained' in the proofs of the premises when the major premise is inferred by introduction." He infers from this that "a proper reduction does not affect the identity of the proof represented." 13 10
Prawitz, "Ideas and Results in Proof Theory," page 257. op. cit, pages 254 and 249. 12 op. cit, page 253. 13 op. cit, pages 246 and 257.
11
156
Normalization, Cut-Elimination and the Theory of Proofs
Of course, as Prawitz himself is quick to point out, this does not imply the truth of the above claim because it makes no mention of the permutative reductions, but although he acknowledges the possibility that these might be problematic, he does not seem to take it very seriously. As it stands, however, the inversion principle does not even allow us to conclude that proper reductions leave unchanged the proof being represented; it tells us only that the two derivations in question (before and after a reduction) have the same conclusion. To supplement it, we need an explanation of the relationship between introductions and eliminations which will guarantee that the proofs represented by such pairs of derivations are the same. An explanation of the appropriate kind is provided by Prawitz's ideas about the validity of arguments. These are sketched in his "Ideas and Results in Proof Theory" and elaborated in subsequent papers. 14 Prawitz suggests that the introduction rules of NJ can be interpreted as expressing the constructive meanings of the logical particles in operational terms, while the inversion principle provides a justification of the elimination rules in terms of these meanings. According to him, the derivations of a formal system are incomplete exemplars of valid arguments; each such derivation needs to be supplemented by operations justifying the inferences of which it is composed. On the above interpretation, introduction inferences are self-justifying; eliminations, on the other hand, do require justifying operations, and these are provided by the proper reduction steps. He goes on to claim that "such an operation which is supposed to justify an inference expresses the meaning of this inference." So, if d contains an elimination whose major premise is maximal and d' results from d by applying a reduction which removes this maximal formula occurrence, "the meaning of the elimination inferences is expressed by saying that an argument [of the form d] . . . represents the same proof as represented by [one of the form d']." d! is "obtained by removing the elimination inference in question according to its meaning and [d and d') are intentional (sic) identical." 15 In more detail: Prawitz considers trees of formulae of a fixed language with some additional information; these he calls arguments. An inference is essentially a means of constructing new arguments from ones already given. For simplicity, assume that the language is a propositional one, then the additional information need only enable us to distinguish between open and closed occurrences of assumptions, and to associate with each closed one a unique inference. Arguments are just derivations in the abstract, as opposed to the derivations of a particular formal system. Prawitz is interested in the question of what constitutes a valid argu14 See, in particular, "Towards a Foundation of a General Proof Theory" (Logic, Methodology and Philosophy of Science IV, ed. by P. Suppes, Amsterdam, 1973) and "On the Idea of a General Proof Theory" (Synthese, Vol. 27, 1974). 15 "Towards a Foundation of a General Proof Theory," page 234.
Interpretations of Derivations
157
ment (when does it represent a proof, when does its conclusion follow from its assumptions) and answers it by elaborating on an idea he attributes to Gentzen. (Writing about his systems of natural deduction, Gentzen remarks, "The introductions represent, as it were, the 'definitions' of the symbols concerned, and the eliminations are no more, in the final analysis, than the consequences of these definitions. . . . In eliminating a symbol, we may use the formula with whose terminal symbol we are dealing only 'in the sense afforded it by the introduction of that symbol'." 16 ) The conditions under which a (logically complicated) statement can be asserted are taken to constitute the meaning of its principal connective, and these conditions are expressed in terms of rules of derivation, or inferences. For example, the meaning of conjunction is expressed by an inference which transforms arguments for A and B into an argument for A A J3. Similarly, the meaning of implication is expressed by an inference which transforms an argument for B from A into an argument for A —> B. Such inferences are said to be canonical. By definition, the result of applying a canonical inference to valid arguments is itself valid; arguments of this form are also called canonical. The meanings given to the connectives may also justify the inclusion of non-canonical inferences in a valid argument, where a justification for an inference I will be an operation which transforms the result of applying I to canonical arguments into a valid argument for the same conclusion. Not only are canonical arguments valid, therefore, but also those arguments which can be converted into canonical ones by means of the justifying operations. Furthermore, if we interpret an open argument as claiming that its conclusion follows from valid arguments for its (open) assumptions, it is reasonable to allow such an argument to be valid whenever all its closed instances are. (A closed instance of an argument is the result of replacing its open assumptions by closed valid arguments for them.) These conditions are sufficient to characterize validity, assuming we know what it means to be a closed valid argument for an atomic sentence: (1) An argument is valid relative to a set B of atomic arguments iff all its closed instances can be converted into canonical arguments or members of B by means of the justifying operations. It is then a trivial matter to show: (2) An argument is valid in the sense of (1) iff it can be generated from assumptions and members of B by means of canonical and justifiable non-canonical inferences. A notion of strong validity can also be defined by substituting "strongly valid" for "valid" in the explanations of canonical argument and closed instance, and insisting in (1) that every (sufficiently long) sequence of jusThe Collected Papers of Gerhard Gentzen, page 80.
158
Normalization, Cut-Elimination and the Theory of Proofs
tifications terminates in a canonical argument or a member of B. (This is not quite Prawitz's definition, but see below.) Specializing the preceding discussion to ATJ, it is obvious that the introduction rules constitute canonical inferences, while the eliminations, although non-canonical, are justified by the various proper reductions. (2) is nothing more than the claim that all the derivations of NJ (relative to a system B of atomic derivations) are valid. 17 Strong validity for derivations is a variant Tait's notion of convertibility for terms. The idea of interpreting reduction steps as defining conditions derives from this same source, although in Tait they define the operations associated with the introduction of various constants, rather than the meanings of elimination inferences. Not surprisingly, strong validity provides a tool for proving that derivations must reduce to a normal form, and is so used by Prawitz in the appendices to "Ideas and Results in Proof Theory." In Appendix A, he considers first-order logic and shows without difficulty that every way of reducing a strongly valid N derivation must terminate. A more complicated argument is required to show that every N derivation is strongly valid. To make it go through Prawitz employs an inductive definition of strong validity which differs from the one given above. What he wants to say is that a derivation terminating with an introduction is strongly valid when the derivations of the premises of its final inference are, while a derivation terminating in an elimination is strongly valid when every derivation to which it reduces in one step is strongly valid. But, he is obliged to stipulate in addition that irreducible derivations are strongly valid to ensure that this notion is well-defined. (Validity is defined similarly except that a derivation which terminates with an elimination is valid when there is some valid derivation to which it reduces in one step.) On the basis of the ideas explained above, however, there is no justification for assigning a special status to normal derivations. They are no more obviously valid or strongly valid (in the earlier sense) than derivations in general, and making them so by definition is to sacrifice the intuitive content of these notions for the sake of technical expediency. It is in fact the lack of an explanation for the significance of normal forms which makes it difficult to interpret normalization theorems in terms of validity. Nonetheless, Prawitz attempts to do so. For example, he writes, "The significance of the strong normalization theorem is very clear from the present point of view. To carry out a reduction is essentially to replace a definiendum by its definiens The strong normalization theorem then says that the arguments in question are well-defined in the sense that each way of successively replacing definiendum by definiens will finally terminate." 18 It seems much more reasonable, however, to say that an argument is well-defined Cf the argument on page 287 of "Ideas and Results in Proof Theory." "On the Idea of a General Proof Theory," page 76.
Interpretations of Derivations
159
if applying the definitions in any order will eventually yield a determinate argument not involving any defined operations. In general, this will mean considering not just the argument in question, but also its closed instances, subarguments of those instances, etc. In other words, an argument is welldefined iff it is strongly valid, and the analogue of (2) above for strong validity expresses the claim that every argument is well-defined. Notice that this differs from the statement of the strong normalization theorem and makes no reference to normal forms. Whatever the interest of these ideas, they do not provide conclusive evidence for the claim in question. In the first place, they depend upon assigning a very narrow interpretation to the logical connectives—narrower than what is usually taken to be the constructive one, and require a treatment of classical arguments which reduces them to particular cases of intuitionistic ones. 19 Furthermore, even granted this interpretation, it still does not follow that interreducible derivations represent the same proof. In particular, the notion of validity which Prawitz adopts is not uniquely determined. Given this notion, it is possible to argue (as he does in the quotation above) that the meaning of an elimination can be expressed by its associated reduction step and, a fortiori, that such steps preserve the identity of proofs. On the other hand, assuming that the reductions have this property, it is possible to justify the definition of validity (as Prawitz does, for example, on page 285 of "Ideas and Results in Proof Theory": " . . . in view of the conjecture about identity between proofs, a derivation that reduces to [a canonical one] .. .shall also be counted as valid."). The interdependence of these ideas, however, makes it unlikely that doubts about the one can be dispelled by an appeal to the other. All this notwithstanding, Prawitz refers to this half of the conjecture as unproblematic, and his opinion seems to be shared by most commentators. Kreisel, for example, claims that it is evident simply by inspection that such reductions do not change the proof described. 20 Apparently, the only dissenting voice belongs to Feferman who, in his review of "Ideas and Results in Proof Theory," objects that information may be lost in the process of reduction. The particular example he considers is a derivation D in arithmetic whose last inference is an application of V-elimination with a maximal premise and a closed atomic conclusion A(t). He points out that the normal form of D "will simply give a computation which verifies A(t); in this case every abstract idea in the derivation may be lost." 21 If the relationship between derivation and proof is in fact a special 19
"On the Idea of a General Proof Theory," page 70. "A Survey of Proof Theory II," page 112. 21 Journal of Symbolic Logic, Vol. 40, 1977, page 234. 20
160
Normalization, Cut-Elimination and the Theory of Proofs
case of that between a linguistic expression and its meaning or content, this objection seems undeniable. The two derivations are clearly not just linguistic variants of one another, and to distinguish between information and content in this context would be nothing more than a quibble. Prawitz's response to this objection is of interest. 22 After conceding the point about loss of information, he goes on to observe "that on the view presented here, a proof is . . . the result of applying certain operations to obtain a certain end result," and claims that this makes it difficult to deny the identity of proofs represented by interreducible derivations. According to him, the real issue is whether proofs can indeed by identified with objects of this kind. It might be more appropriate, however, to question whether a proof thought of in this way can be identified with the meaning of the derivation which represents it. I propose to consider this last question below, even though Prawitz does not do so. There is less to be said about the other half of the conjecture, that any two representations of a proof are interreducible. Prawitz does not even attempt to argue for it and merely refers the reader to Kreisel's "A Survey of Proof Theory II" for a discussion of "the possibility of finding adequacy conditions for an identity criterion such as the one above." 23 As one might imagine from this description, the discussion turns out to be somewhat inconclusive. Kreisel endorses the doctrine that derivations are the linguistic expressions of proofs, but he gives it a psychological twist by identifying these latter with mental acts. 24 Furthermore, he takes it for granted that all the reduction steps under consideration preserve the identity of proofs. His idea is that, if he can find a "mathematically manageable" condition M(di,d 2 ) which follows (in some informal sense) from the claim that d\ and c?2 represent the same proof, it might be possible to show: (*)
M(rfi, c/2) implies that d\ and d2 reduce to the same normal form.
Recall that removing the last inference from a closed normal derivation d of an existential formula, (3x)A(x), transforms it into a derivation of A(td) for some term £<*; this term may be written as td> when d! is a derivation whose normal form is d. Also, given a formal system F , an assignment AF of F terms to parameters and closed F derivations to their conclusions, and an arbitrary derivation d, let Ap{d) be obtained from d by replacing the parameters occurring in its open assumptions and conclusion with F terms, and the open assumptions of the derivation which results with closed F derivations—all in accordance with the assignment AF- Kreisel restricts his attention to derivations of the predicate calculus with existential conclusions. Let d\ and d2 be such derivations (of the same conclusion from 22
See pp. 249-250 of "Philosophical Aspects of Proof Theory." "Ideas and Results in Proof Theory," page 257. 24 Kreisel, op. cit, page 111. According to him, words in general express mental entities—namely, thoughts. 23
Interpretations of Derivations
161
the same assumptions), then he suggests the following as a possible choice for M(di,d2): For all extensions F of predicate logic for which normalization holds and all assignments AF, W ( d i ) a n d ^AF{d2) define extensionally equal functions (presumably, in all interpretations of the appropriate kind). It seems obvious that, if d\ and d2 represent the same proof, M{d\, d2) ought to hold. Unfortunately, however, the converse is not true (as Kreisel himself is quick to point out) and, as a result, (*) fails too. The problem is that we can easily construct an existential statement all of whose derivations will supply the same instantiating term. The particular example Kreisel considers is of the form (3x)(x = cAP), where P is independent of x. 25 Clearly, any derivation d of this statement will be such that td = c, no matter how P is obtained. In view of the above, Kreisel describes M(d\,d2) as providing only a partial criterion, and takes pains to emphasize that this does not invalidate it. Perhaps his general point, that a partial criterion may be both useful and interesting, is correct, but it is of little relevance to the present case. M{di,d2) applies only to some derivations, but we have no idea how to characterize this subclass; it needs to be supplemented by other conditions to increase its range of application, or to make its limitations explicit, but we have no idea what these might be. (In fact, there is no particular reason to believe that such conditions—i.e., ones which do not supplant M(di,d2), but complement it—exist.) In short, not only does (*) fail to hold, but M(d\,d2) seems to have no interesting consequences. Kreisel too dismisses M(d l 5 d2), albeit on general grounds. For him, (*) has the form of an adequacy condition which the interreducibility relation might satisfy (amongst others) before it can be accepted as characterizing identity between proofs. The example discussed above indicates that it is not applicable to all derivations of existential formulae. What disturbs Kreisel is that, even if (*) could be shown to hold whenever di and d2 actually represented the same proof, we would have managed only to evade rather than solve questions about the nature of the identity of proofs or, indeed, about the nature of proofs. For what we propose to show, for specific derivations, is that questions of the adequacy of [interreducibility] can be settled without closer analysis of the concepts involved.26 Again, one may grant his general point but question the emphasis he places on it. No one pretends that, if (*) could be shown to hold, it would provide us with an understanding of the concepts involved sufficient to render 25 26
Kreisel, op. cit, page 116. Kreisel,op. cit, page 117.
162
Normalization, Cut-Elimination and the Theory of Proofs
further analysis unnecessary; on the other hand, for almost any reasonable choice of M{d\,d2), such a result would be of interest and would at least advance our understanding of them. It seems perverse to criticize M(di,d,2), not because it fails to coincide with any interesting relation between derivations or proofs, but because such a relation would not be fully explained even if it did. Having reached this negative conclusion, Kreisel claims only a pedagogic value for his discussion. Apparently, our ability to recognize the shortcomings of his proposed criterion serves as a lesson to those who assume "that 'nothing' precise (and reasonable) can be done on questions about synonymity of proofs." 27 Whatever contribution may have been made to pedagogy, however, no new evidence has been adduced in favor of Prawitz's conjecture. The net result of our efforts to discover such evidence has in fact been rather disappointing. The analogy between derivations and terms, which suggested the conjecture in the first place, remains the most persuasive argument in favor of its first part, whereas its second part has not been substantiated in any way. We do not even know where to look for evidence which might support this latter. Before drawing any conclusions from this state of affairs, I will consider one other attempt to vindicate Prawitz's conjecture, albeit at the cost of reinterpreting it. The attempt is made by Martin-L6f in "About models for intuitionistic type theories and the notion of definitional equality." 28 As its title suggests, much of this paper is devoted to the analysis of a relation which Martin-Lof calls definitional equality. According to him, it "is used on almost every page of an informal mathematical text" (page 93), may reasonably be identified with Frege's equality of content (page 104) and enters into the definitional schemata of the primitive recursive functionals of Godel's theory T (pages 105-6). All of these claims may be called into question, however, as may the mathematical interest of this relation. Martin-L6f distinguishes between informal notions and their formal counterparts, between, for example, propositions and formulae, proofs and derivations, and mathematical objects and terms. Corresponding to the formal relation of convertibility (he calls the symmetric closure of the reduction relation between terms by this name) is the informal one of definitional equality. It might be thought that the distinction between informal and formal corresponds to that between abstract objects and their linguistic representations, but this is apparently not the case. Definitional equality is explicitly described as a "relation between linguistic expressions and not between the abstract entities they denote and which are the same." (page 93). As 27
Ibid. Proceedings of the Third Scandinavian Logic Symposium, ed. by S. Kanger, Amsterdam, 1975, pp. 81-109. 28
I n t e r p r e t a t i o n s of Derivations
163
for the other informal notions, they too seem to be regarded as linguistic items. Martin-Lof asserts, for example, that definitionally equal propositions are notational variants of the same abstract proposition (page 94) and contrasts proofs thought of as linguistic expressions with abstract proofs (page 104). His usage suggest that, while terms like "proof" and "proposition" may refer either to linguistic or to abstract objects, he intends the former when using them without qualification. It seems to follow that mathematical objects must also be expressions of an informal language. This is consistent with Martin-Lof's account of models which occupies the first half of the paper, although it is not clear whether he really wants to say that a model is a linguistic structure. 29 Although a conception of mathematics according to which its objects are certain expressions of an informal language seems sufficiently original to deserve a new name—informalism suggests itself as a suitable candidate, Martin-Lof associates it with the intuitionistic tradition. (He never discusses how, if at all, such a conception can be reconciled with Brouwer's oft repeated insistence that mathematics "neither in its origin nor in the essence of its method has anything to do with language." 30 ) A connection between his views and other attempts to explicate intuitionism becomes apparent once it is realized that Martin-Lof's interest in informal languages derives exclusively from their being interpreted. This enables him to talk about the expressions of such languages where other authors would talk about meanings in the abstract. Whether his preference for the former idiom over the latter is motivated by philosophical or pragmatic considerations is never made clear, but he does exploit it for his own advantage, most notably in the characterization of definitional equality. Since informal languages, unlike formal ones, come ready interpreted, each informal notion can be used to give meaning to the corresponding formal one. In other words, the distinction between formal and informal, although entirely on the level of language, can be correlated with that between (formal) syntax and semantics. Definitional equality is characterized as the least equivalence relation containing certain equations between expressions called definitions and closed under the following rule: 29 T h e alternative is to regard it as an abstract one, but to interpret formal theories in terms of the (meta-)linguistic description of the model, rather than in the model itself. There is perhaps no essential difference between these two approaches. In both cases we are presented with a formal language, a system of abstract objects, and a description of the latter—in English, let us say. The formal language is given meaning by correlating its well-formed expressions with certain expressions of English and thus, indirectly, with the abstract notions they describe. Whether we reserve the term "model" for the abstract system or apply it to the fragment of English which represents this system seems to be just a terminological matter. 30 T h e quotation is taken from a paper entitled "The Effect of Intuitionism on Classical Algebra of Logic" (Proceedings of the Royal Irish Academy, A 57, 1955), but the same sentiment is expressed by Brouwer in countless other places.
164 (t)
Normalization, Cut-Elimination and the Theory of Proofs If a and b are definitionally equal and c(x) is any expression, then c(a) and c(b) are also definitionally equal.
(Here "a" and "b" are supposed to denote expressions while uc(a)" denotes the result of substituting the expression a for free occurrences of the variable x in c(x).) No explanation is offered as to what constitutes a definition, and the remaining conditions are justified by an argument to the effect that they are the minimal ones needed if definitional equality is to serve Martin-L6f's purposes. A noteworthy feature of this relation is that it is not closed under the condition: (J) If a and b are definitionally equal, so are Xx.a and Ax.b. Martin-Lof's formulation of definitional equality glosses over the difficulties associated with this relation. In the first place, it is not clear how definitions are to be distinguished from other equations. Implicit in his discussion is the view that they are more than a means of introducing notation. (Cf. for example, the introduction of 6 ( a i , . . . , an) on page 85.) On the other hand, for two expressions to be definitionally equal it is not sufficient that they denote the same object. (The rejection of (J), if nothing else, makes this clear.) The tacit understanding is that the expressions on either side of a definition have the same meaning, but this is nowhere stated explicitly. The reason most likely is that Martin-Lof wishes to avoid that quagmire of philosophical discussion having to do with synonymy and the nature of meaning. His attitude seems to be that, since we use definitions in informal contexts, we must have a sufficient understanding of what they are. By formulating definitional equality as a purely syntactic relation we can avoid the need to analyze this understanding; it is enough to chronicle our usage. A second point concerns the closure conditions on definitional equality. If this relation is supposed to be something like sameness of meaning, it must be an equivalence and may plausibly be claimed to satisfy (f). What is less obvious is that these are the only conditions it need fulfill. MartinLof treats the question of which operations preserve definitional equality purely as a technical one. He therefore feels entitled to reject (|) on the basis of the principle of sufficient reason. This seems innocuous enough so long as we think only in syntactic terms. Consider however that, by doing so, he is in effect maintaining the following: "the result of applying the operation 7 to a" may not mean the same as "the result of applying the operation 7 to /?" even though "a" and "/?" have the same meaning. Perhaps this is true, but it is rather a counterintuitive claim which is not supported by anything in our informal experience. Furthermore, there seems to be no way to substantiate it other than by showing it to follow from some convincing account of meaning. (A satisfactory determination
Interpretations of Derivations
165
of its status is of some importance since one of Martin-L6f 's major points is that the formal analogue of (J) should not be part of the definition of convertibility.) Mart in-Lof pretends that definitional equality is a familiar relation, and hence not in need of explanation or justification. This does not, however, prevent him from insisting that it has particular properties which are obviously left undecided by our intuition. According to Martin-L6f, definitional equality is the standard or intended interpretation of the convertibility relation. This means, in the case of the theory of combinators, that a privileged position is assigned to weak equality and hence to weak reduction. In the case of the A-calculus, the "correct" reduction relation is the image of weak reduction under the usual mapping between combinatory and A-terms, which is of course weaker than A/?-conversion.31 The implications of the above for natural deduction are that the permutative reductions, immediate simplifications and expansions cannot be part of the convertibility relation which correctly formalizes definitional equality between proofs (pages 100-1). More importantly, the proper reductions steps must be weakened so as to apply only to the final inferences of open subderivations (page 96), where an open subderivation of d is one such that none of its open assumptions are closed in d. We are now in a position to appreciate Martin-Lof's discussion of Prawitz's conjecture. In his opinion it admits of two possible interpretations. According to the first, proofs are linguistic objects whose sameness is a matter of definitional equality. If we accept the foregoing analysis, the conjecture becomes true on this interpretation once the convertibility relation is restricted in the manner indicated above. According to the second interpretation, proofs are abstract objects and to say they are the same means simply that they are identical. Martin-Lof's comment here is that "there seems to be little hope of proving the conjecture in this form unless identical is replaced by provably identical" (page 104). As for the relationship between convertibility and provable identity, he believes it to be settled by his own earlier discussion in which he argues that, for a variety of theories, the former implies the latter but not vice versa.32 31 For an explanation of these notions, see for example Introduction to Combinators and \-Calculus by J. R. Hindley and J. P. Seldin, Cambridge, 1986. 32 Reduction rules were introduced originally as a means of analyzing equality. It is scarcely surprising therefore that convertibility, on any definition of this relation, implies equality in all formal contexts. The converse does not hold in general for convertibility as understood here. It is relatively easy to find a theory T and terms t\, ti such that T h (where p and q are the left and right projection functions, respectively, and z is a variable—all of the appropriate types) in a type theory which, while not including the rule: < p(z), q(z) > reduces to z, contains axioms of the form
Vxa\/yrA(<
*,!/>)—
\/zaXTA(z)
These together with the identity axioms and the usual defining equations for p and q
166
Normalization,
Cut-Elimination
and the Theory of Proofs
Despite their definiteness, there is something unsatisfactory about these conclusions. In the first place, the interpretation which is dismissed on the grounds that it makes the conjecture unprovable is probably the intended one—and certainly the most interesting. Furthermore, a statement may be worth investigating even though it does not admit of direct proof. In the present case such an investigation might actually involve establishing the relationship between convertibility and identity in some formal theory, provided that the theory in question could plausibly be claimed to capture significant properties of the identity relation between proofs. (The result would be of doubtful interest otherwise.) This claim, however, is no more likely to be provable than the original conjecture. It seems overly optimistic, therefore, to suppose that the status of the conjecture depends solely upon the answer to a technical question—especially because, once the meaning of identity between (abstract) proofs has been settled, there still remains the definition of convertibility. Martin-Lof's characterization of this latter relation is certainly the most controversial aspect of his treatment. Acceptance of it would oblige us to revise our most basic ideas about normal forms and their significance—most notably, the idea that normal derivations have to be direct. A derivation which is irreducible in MartinLof s sense may lack the subformula property. In fact, no bound (expressed in terms of the complexity of its assumptions and conclusions) can be placed on the complexity of the formulae occurring in such a derivation. 33 enable us to derive < p(z), q(z) > = z. Because the terms on each side of this equation are irreducible, however, we cannot prove that one converts to the other. The situation for closed terms is slightly different. As Martin-L6f points out, a closed derivation of an equation will usually yield a means of converting its terms into one another. So, we can at least state a partial converse of the above, namely: t\ converts to ^2 if there is a closed derivation of t\ — ti for a range of familiar theories satisfying the normalization theorem. The preceding settles the relationship between provable identity and convertibility only if one accepts that Martin-Lof's characterization of the latter is, as he asserts, the correct one. 33 Consider the following derivation: [A]
[A] {A
A^(A_ .. .(4 _»
[A]
(A^A)...) A).
A-+A A->A
(All occurrences of the assumption A are discharged by the final application of
Interpretations of Derivations
167
As pointed out earlier, we cannot afford to be too dogmatic about what constitutes the correct definition of interreducibility, but so unacceptable a consequence makes it tempting to reject Martin-Lof's candidate out of hand; whatever applications his notion of convertibility may possess, it is surely not the relation we are interested in analyzing. At the very least, we should be cautious about accepting it and subject the arguments offered in its favor to careful scrutiny. There are only two such arguments, and one of them is not really relevant to the purpose at hand. The first is that, by weakening the definition of convertibility in the manner suggested above, certain technical advantages are obtained. 34 The weaker relation is certainly more manageable (cf. the case of weak vs. strong reduction in combinatory logic), although it is not without its drawbacks too. Even if we grant the point, however, it establishes only that the relation defined by Martin-Lof can be useful—and this is not in dispute. As he himself writes (on page 96): "we are free to define many different relations between terms and call them convertibility relations." He then adds: "but my claim is that only one of these correctly formalizes the [intended interpretation of convertibility]." I am interested in his contention that he has supplied the correct formalization. This brings me to the second argument: (a) The intended interpretation of convertibility is a relation between linguistic expressions called definitional equality. (b) Definitional equality is the least equivalence containing various defining equations and closed under (f) above. (c) It follows that the correct definition of convertibility simply formalizes the properties mentioned in (6) (and that is exactly what Martin-Lof's does). Once (a) and (b) are accepted, there is obviously no denying (c). It would appear therefore that, no matter which interpretation of the conjecture is favored, MartinLof's remarks about it depend ultimately on his conception of definitional equality. As was hinted earlier, however, he seems to think that this relation is a familiar one whose role needs no explanation and whose properties, once stated, are easily recognizable as such. Consequently, he does not embark on a systematic defense of (a) or (b). In introduction.) This example depends upon the fact that its conclusion is introduced by an application of —•-introduction. It is the only rule which will produce such a result in DT although similar examples can be constructed in NJ and NK using V- or 3-elimination. (A notion of normal form which is sensitive to differences of this kind is perhaps not entirely satisfactory.) Notice, however, that the subformula property will hold for certain subclasses of derivations—for example, derivations of atomic formulae in the negative fragment—so that many applications of normalization arguments will not be much affected by this weakening of the reduction relation. 34 Some of these are listed in Section 2.1 of Martin-Lof's paper, op. cit
168
Normalization, Cut-Elimination and the Theory of Proofs
view of the importance of definitional equality to the discussion, it is worth considering briefly whether he is justified in treating it in this way. Martin-Lof claims that by definitional equality he means "the relation which is used on almost every page of an informal mathematical text." He also claims that it is to be found in the writings of such authors as Frege and Godel. None of these claims will bear much scrutiny, however. It seems obvious that definitions are used informally in more than one way, and it may plausibly be argued that in one of these usages they express a relationship between signs. Nonetheless, whatever relation R holds between definiens and definiendum, R is not an equivalence. (It is probably not reflexive, symmetric or transitive, let alone all three.) In the second place, it is not closed under (f) above. (This is not to deny that the inference c(a) = c(b) is often drawn from a =df. b, but this is simply to justify the conclusion that c(a) and c(b) are the same by reference to the definition of a. It certainly does not suggest that there is some special relationship between "c{af and "c(6).") The evidence for these assertions is contained in any "informal mathematical text," where the reader will be hard put to find statements like "b =df. a because a =df. b" or "since a =df. b, it follows that c(a) =df, c(6)." This sort of nitpicking about usage does not really get us very far. It certainly provides no argument against the possibility, or even the desirability, of introducing a relation like definitional equality. On the other hand, it is surely sufficient to establish that, far from playing the central role attributed to it by Martin-Lof, this relation is not to be found in mathematics as it is currently written. Turning now to Frege, we find according to Martin-Lof that definitional equality may be identified with the relation of equality of content found in the Begriffschrift (and symbolized there by "=") "provided one disregards the geometrical example" with the aid of which it is introduced (in Section 8). He goes on to say that Frege's axiomatization of = (in Sections 20 and 21) is not "compatible with the analysis of the relation given earlier" (presumably, earlier in the Begriffschrift). The reason is that Frege gives the familiar identity axioms for =, and Martin-Lof contends that, because "a" and "6" stand for themselves in some occurrences and for their contents in others, statements like a = b —• (A(a) —* A(b)) are meaningless. He concludes that this led Frege to replace = with the more familiar equality relation analyzed in "Uber Sinn und Bedeutung" and the Grundgesetze. The implication is that Frege gave the wrong axioms for the notion he was trying to capture, and that the consequences of this mistake led him to abandon it altogether. Unfortunately, none of this is in the least plausible. It is obvious from the outset that Frege's identity of content resembles Martin-Lof's definitional equality in only one respect: it too is a relation between expressions. As the example we are asked to disregard makes clear, it is the relation
I n t e r p r e t a t i o n s of Derivations
169
which holds between two terms when they denote the same thing. The axioms presented in Sections 20 and 21 are, of course, valid under this interpretation. Furthermore, Frege seems not to have been unduly disturbed by the need to use names in a systematically ambiguous way. (Granted that this usage offends the ears of formal language speakers nowadays, it surely does not reduce the axiom quoted in the preceding paragraph to meaninglessness.) It is unlikely, therefore, that this impelled him to revise his treatment of identity. A more plausible suggestion is that, finding himself unable to distinguish between more than two judgable contents, he abandoned the notion of content altogether in favor of the doctrine of sense and reference. He then found it convenient to reformulate his treatment in terms of this distinction. The point to stress is that no great discontinuity exists between the earlier and the later accounts of identity. In fact, identity judgements are analyzed in remarkably similar terms in Begriffschrift and "Uber Sinn und Bedeutung." 35 The geometrical example of a single point determined in two distinct ways which appears in both works, albeit in slightly different guises, makes these similarities particularly evident. Frege takes pains to stress the connection between names and ways of determining. In Begriffschrift he writes that the different "ways of determination" correspond to the "different names of the thing thus determined," while in "Uber Sinn und Bedeutung" he speaks of "different designations of the same point" and states that these names "indicate the mode of presentation." 36 It is this connection which saves judgements of identity from triviality. The difference between the two works is that in the later one he grants these modes of presentation an existence apart from the names to which they correspond. Of course, he also reformulates his account of identity statements so that the relation asserted to hold is between the individuals named rather than the names themselves, but this seems to be of less interest. I am not denying the obvious formal differences between these two sorts of relation, but from most perspectives—including Frege's, I suspect—it matters little whether "a = 6" is taken to mean that "a" and "6" name the same individual, or that the individual named by "a" is the same one named by "6." Frege nowhere suggests that its way of determining is part of the content of a name. On the contrary, his analysis precludes such a possibility since it is only when the same content is determined in different ways that a non-trivial identity judgement can be made. For singular terms, at least, content can be identified with what Frege later called reference. This much seems uncontroversial and is sufficient to refute Martin-Lof's interpretation 35 For this observation, and for much else in this paragraph, I am indebted to the interesting discussion of Frege's views in Chapter 4 of An Essay on Facts by K.R. Olson (Stanford, 1987). 36 T h e phrases in quotation marks are those used by Geach and Black in their Translations from the Philosophical Writings of Gottlob Frege (3rd edition, Oxford, 1980).
170
Normalization, Cut-Elimination and the Theory of Proofs
of identity of content, even if one prefers his account of why Frege eventually revised the Begriffschrift theory. When Frege turns to definitions in Section 24 of Begriffschrift, he distinguishes them from identity judgements in the previous sense not because they are concerned with a different kind of relation—they are not—but because, being prescriptive, they are not to be regarded as judgements at all. Baker and Hacker comment on this section: "If a symbol is introduced by a formal definition, the fact that it designates an entity in a particular way . . . seems to be an altogether objective feature of it, and hence there seems pressure towards adopting the principle that in this special case the way of regarding (or the mode of determining) an entity is part of its content. Frege did not draw this conclusion."37 So, there is no comfort for MartinLof here, either. Somewhat ironically, his ideas are more easily reconciled with Frege's later views. It is only after the mode of determining has been separated from the name that it makes sense to ask whether two names determine an object in the same way, and I take this to be the question underlying his conception of definitional equality. In the final analysis, however, nothing approximating to this relation is to be found in Frege's writings. It only remains to consider what Godel has to say about definitional equality and, in particular, whether it plays any role in his Dialectica paper cited earlier. The paper describes a translation of intuitionistic arithmetic into a system T of computable functional of finite type. Roughly speaking, T comprises: (i) certain equational axioms for defining these functions, (ii) the principle of proof by induction (with respect to a numerical variable), (Hi) the usual axioms for identity, and (iv) the propositional calculus—including axioms of the form (s = £) V (s ^ t) for all terms s and t. Martin-Lof contends that 'equality' in (i) is not the relation which is described in (Hi) or appears in an equation obtained with the aid of (ii). His reason is that he takes the former to be definitional equality, whereas "we cannot convince ourselves [of the validity of (ii) or (Hi)] unless, when reading the formulae, we associate with the terms not themselves but the abstract objects which they denote." (page 106). In fact, he upbraids Godel for remarking in a footnote to the identity axioms that this relation is to be understood as "intensional definitional equality." 38 Rather sur37
Frege: Logical Excavations by G.P. Baker and P. Hacker, New York, 1984, page 160. T h e reason for the remark is that from a constructive viewpoint (extensional) equality between functions of higher type is not a decidable relation. The axioms mentioned in iv) are essential for the translation, however, so " = " must be given some other interpretation if T is to be constructively acceptable. 38
Interpretations of Derivations
171
prisingly, this remark probably gave rise to the idea of definitional equality as a relation between terms which is determined by their conversion rules. Tait, while discussing the interpretation of equality in (certain extensions of) T, observes that according to Godel's own interpretation: "s = t means that s and t denote definitionally equal reckonable terms." (The term he translates as "reckonable" is rendered by "computable" above.) He goes on to say: "Lacking a general conception of the kinds of definitions by which an operation may be introduced, the notion of definitional equality is not very clear to me. But if . . . we can regard the operations . . . as being introduced by conversion rules . . . then definitional equality has a clear meaning: s and t are definitionally equal if they reduce to a common term by means of a sequence of applications of the conversion rules." 39 This is essentially Martin-L6f 's view, except that he is less diffident about the general notion of definitional equality.40 There is a difference, however. For the reasons alluded to earlier, Martin-Lof regards definitional equality, not as a possible interpretation for identity in T (or any other theory), but as a relation satisfying different laws. Despite what Tait writes, it is doubtful whether Godel really intended s = t to express a relationship between terms. If he had, it is difficult to understand why he regarded T as constituting an extension of finitary mathematics. According to his own account: Bernays' observations . . . teach us to distinguish two component parts in the concept of finitary mathematics, namely: first, the constructivistic element, which consists in admitting reference to mathematical objects or facts only in the sense that they can be exhibited or obtained by construction or proof; second, the specifically finitistic element, which requires in addition that the objects and facts considered should be given in concrete mathematical intuition. This, as far as the objects are concerned, means that they must be finite space-time configurations of elements whose nature is irrelevant except for equality or difference. . . . 39
Tait, op. cit, page 198. He is no doubt encouraged in this by Kreisel who makes light of Tait's misgivings in a review of the latter's paper (in Zentralblatt fur Mathematik, Vol. 174, 1969, pages 1213). Kreisel attributes them to the view—mistaken, in his opinion—that to make sense of definitional equality for constructive operations it is necessary to have a listing of their possible arguments. (He makes the same point in "A Survey of Proof Theory II," page 156: "Tait expressed doubts in [1967] about the sense of definitional equality t = t' unless all possible arguments o f t and t' are listed.") This criticism seems misplaced, however. The passage quoted above points out only that the notion of definitional equality depends upon what constitutes a definition, and that we lack a general answer to this question. It is hard to find fault with this observation. (Presumably, even definitional equality between number-theoretic functions is not very clear to Tait, although there can be no doubt about their possible arguments.) 40
172
Normalization, Cut-Elimination and the Theory of Proofs It is the second requirement which must be dropped [in the face of negative results about the provability of consistency].41
He goes on to say that his theory T is one result of doing so. Now, the terms of T conform to the finitistic description of objects; the conversion rules which are supposed to settle questions of their identity and difference are finite combinatorial operations, and the theorems of T are just (propositional combinations of) equations of the form s = t. The intended interpretation of T, i.e., what its theorems are supposed to be about, must therefore lie outside this domain. In fact, it is clearly stated that T is a theory of certain abstract objects, the computable functionals of finite type; this granted, equations in T must surely express a relation between these abstract objects. It should be apparent by now that there is little in Godel's paper to support Martin-Lof's analysis of definitional equality. In fact, I have tried to argue that there is little reason to accept it at all. Few arguments, and no compelling ones, have been advanced in favor of the thesis that the intended interpretation of convertibility is a relation between linguistic expressions, and the same can be said of his claim that, because its intended interpretation does not satisfy the laws of identity, the definition of convertibility must be weakened in the manner indicated above. Furthermore, even if one accepts definitional equality (in the sense of Martin-Lof) as the correct interpretation of convertibility, it seems to me that this is quite simply a case of explaining obscururn per obscurius: Martin-Lof succeeds neither in establishing that the former is a familiar relation, nor in explaining its significance. These strictures notwithstanding, Martin-Lof's discussion of identity between proofs does focus on the central questions: where proofs are to be located in the scheme of things, and what sort of equivalence relations might hold between them. That these issues should be contentious at all is due, in part at least, to the view that proofs are intensional and that a relation other than the familiar one of extensional equality holds between them. The notion of intensionality is a notoriously problematic one. It appears under a variety of different names, and the term "intensional" has been used to express a variety of different distinctions. 42 This is not the place 41 T h e quotation is from a revised English version of Godel's Diaectica paper (see Vol. II of his Collected Works, page 274), but a similar passage occurs in the original (ibid., page 244, or Dialectica, Vol. 12, page 282). It is unclear why Tait interprets s = t to mean that s and t denote definitionally equal terms. Three or four lines earlier he suggests that s and t denote functionals. These statements are hard to reconcile unless terms are supposed to be used ambiguously in the manner recommended by Martin-Lof. There is no mention of this, however, which makes it seem an unlikely possibility. Perhaps it is simply a slip on his part, occasioned by the existence of term models for T or his reservations about the notion of definitional equality. 42 This is well illustrated by the appendix to Fr. Frisch's essay Extension and Com-
I n t e r p r e t a t i o n s of Derivations
173
to survey or evaluate these usages, but I do think it worth distinguishing between two of them which, although not independent of each other, can be separated. In so doing, I do not mean to suggest that these two are the only ways to understand the term or, least of all, that they are the only correct ones. My claim is simply that elements of both are present in discussions about proofs and that some advantage is to be gained from distinguishing between them. The first sense is easy to explain: intensional means not satisfying an extensionality principle of the appropriate kind. The second is harder to make precise, so I propose to give only a rough idea of what I have in mind. A distinction can be drawn between some domain of elements and a system which describes or represents it. The domain can be anything from the universe in which we find ourselves to a mathematical structure. As for the representational system, it is usually thought of nowadays in linguistic terms, and I shall follow this practice below, but it need not be. Ideas, for example, could serve just as well, and would have done so in another age. I am less concerned with the exact nature of these two items than with the relationship between them. To explain this, i.e., how language can refer to the elements of the domain or, alternatively, how these can be grasped in linguistic terms, an intermediate realm is sometimes postulated which consists of 'ways of determination' or 'modes of presentation'—to borrow a pair of phrases from Frege. In its second sense the word "intensional" characterizes the denizens of this realm. I prefer this usage and propose to adopt it below, reserving "non-extensional" for the first sense. 43 Kreisel respects none of the distinctions drawn above. His conception of intensionality is patterned after the paradigm of a predicate or formula which represents a property and its extension. Here, formal systems play the role of predicates. The proofs expressed by such a system can be compared to the property, while the conclusions established by these proofs are analogous to its extension. In case the system generates arguments from assumptions, comparison with a functional term is more appropriate: prehension in Logic (New York, 1969) in which he lists one hundred and seventy eight modern (i.e., since 1662) logicians, the terms they use to express the distinction between intension and extension, and what they understand it to be. The remarkable feature of this list is not that there are similarities between its entries, but that scarcely any two of them are exactly alike. 43 I am quite comfortable with the division into objective (i.e., pertaining to the objects of our interest—not, as opposed to subjective), intensional and linguistic spheres. I realize, however, that there are those who deny the independent existence of one or more of them and others who deny their existence outright. Although this presents a potential source of problems, I hope that the discussion which follows will be acceptable even to those who reject my metaphysical prejudices. All it depends upon really is the claim that we can draw some kind of distinction between these three elements of our experience. Whether it is a distinction in re or merely in intellectu need not be decided here.
174
Normalization, Cut-Elimination and the Theory of Proofs
its consequence relation is analogous to a function in extension, while its proofs correspond to the procedure by which values are obtained from arguments. For example, Prawitz, who shares this general outlook, comments on Frege's rules for the predicate calculus that they "may be understood as an extensional characterization of logical proofs (i.e., a characterization with respect to the set of theorems) within certain languages . . . but the characterization is only extensional since the formal derivation[s] may use quite different methods of proof and have a structure different from the intuitive proof[s]."44 It is not just the system as a whole which may have an intensional feature however. Individual proofs may also be described as intensional objects. Kreisel, for example, at the beginning of his review of The Collected Papers of Gerhard Gentzen refers to them as such when emphasizing that "the distinction between different formal systems with the same set of theorems, in terms of the proofs expressed by their derivations, is meaningful," and he goes on to note: "I use 'express' for the relation between a formal expression E and the intensional object meant by E, and 'denote' for the case when we suppress the intensional features of the object, for example, in model theory." 45 It seems then that there are intensional objects but these are continuous with extensional ones—the latter being simply the former minus some of their characteristics. (This is a sort of opposite to the view, espoused by Quine amongst others, that a property is a set with something added.) In the present case, presumably, what the object asserts is an extensional feature, how this assertion is established an intensional one. This view may reasonably be described therefore, in the terms used earlier, as one which associates intensions with how objects are presented and, at the same time, insists that objects cannot be separated from their mode of presentation. As for the criterion of what constitutes an intensional feature, this is provided by a principle of extensionality. So, for Kreisel, there is no difference between intensional and non-extensional. What form such a principle should take is assumed to be uncontroversial. This assumption—although almost universal—is, I think, unfortunate. Consider, for example, proofs as conceived above. Once their intensional features are suppressed, we are left with the conclusions they assert (on the basis of their assumptions). The extensionality principle underlying Kreisel's analysis seems, therefore, to be one derived from regarding the consequence relation as the analogue of a function in extension, namely: two proofs are extensionally equal if they have the same assumptions and conclusion. Now, this is certainly a possible criterion, but it is not the only one. In fact, it even conflicts "Ideas and Results in Proof Theory," page 238. Journal of Philosophy, Vol. 68, 1971, page 243.
I n t e r p r e t a t i o n s of Derivations
175
with the obvious extensionality principle for proofs, when each one of them is regarded as a function along the lines suggested by the analogy with A-terms.46 Intensionality appears in another guise too, associated now with the way in which an expression determines (rather than with the mode of presentation of an object). Somewhat confusingly, therefore, intensional objects seem to be continuous not only with extensional ones, but also with linguistic representations. For example, after an analysis of Godel's Second Theorem, Kreisel remarks that "not only deductions treated as extensional objects are relevant here . . . but even additional information or 'structure', namely the sequence of operations involved in building up the deductions." 47 It is true that the features of a derivation which, by implication, do not count as extensional are here associated with how it is constructed (i.e., with its mode of presentation) but, thought of in this way, the distinction is purely conventional and arbitrary. I raised this same point earlier in connection with the elimination rules for V and 3, and when comparing L with N rules. The fact is that formal derivations always bear traces of their construction; the extent to which they do so varies from calculus to calculus, and even from rule to rule within a calculus. It only makes sense to distinguish between extensional and intensional features in this context if derivations are thought of not merely as formal objects, but as expressions. Their extensional features are sufficient to determine what they express, their intensional ones indicate how. This does indeed seem to be the point of the remark quoted above, namely, that for the theorem to hold it matters not only what proof a derivation expresses, but also how (as "the proof X," for example, or as "the proof X than which there is no earlier proof of the negation of its conclusion in some fixed listing"). The intensional features described in the preceding paragraph differ from those discussed earlier. Proofs were originally said to be intensional because they mediate between derivations and assertions, now there are intensional elements which mediate between derivations and proofs. Perhaps these too deserve to be called proofs. Prawitz seems to encourage this usage when he implies that derivations which represent the same proof are synonymous. 48 Certainly, it underlies the distinction drawn by Martin-L6f between (linguistic) proofs and abstract ones. His point, apparently, is that a proof can be a certain kind of abstract object or a way of determining (or defining) such an object. He has little to say about what constitutes 46
T h e division of an object's characteristics into extensional and intensional ones is oddly reminiscent of the traditional distinction between primary and secondary qualities. The former inhere in the object itself, the latter have to do with how it appears to us. The problem here is that it is not altogether obvious where the line is to be drawn. 47 "A Survey of Proof Theory II," page 179. 48 "Ideas and Results in Proof Theory," page 237.
176
Normalization, Cut-Elimination and the Theory of Proofs
identity between proofs in the first sense. As for the second, identity does not mean determining the same object, but determining it in the same way; hence it is a matter of definitional equality. In the terms I introduced earlier, proofs in the second sense are intensional objects. Unfortunately, Martin-Lof, for reasons which he never makes explicit, treats the intensional as a subdivision of the linguistic. As a result, he refers to them as expressions of an informal language. What emerges from the above is, I think, not only that "intensional" and "extensional" can each be used in different ways, but also that they are so used—sometimes by the same author in a single work. This is relevant to our inquiry because, as I suggested earlier, the language of extensions and intensions provides the framework within which the most basic and general questions about the nature of proofs are to be settled. I do not wish to imply that any of the authors whose views I have been discussing are confused in their use of these terms, but they are perhaps a little confusing. I have tried to indicate above, in a reasonably unambiguous manner, what I understand by "intensional" and "extensional." It remains for me now to classify proofs according to the concepts so introduced. My proposal is that we should not regard them as intensional objects at all, but simply as the denotations of derivations. This does not rule out a study of the relationship between proofs and what they establish, but it does affect how the relation is to be described. It is also not intended as a comment upon the possibility or the interest of a study of how derivations denote proofs. Contrary to what Martin-Lof suggests, however, I believe that, if such a study is to be successfully undertaken, it can only be after we have gained a better understanding of proofs themselves (including their identity criteria). We would be ill-advised, therefore, to concentrate upon it at the outset. As for the question of whether or not proofs are extensional, and even what constitutes an appropriate criterion of extensionality for proofs, these are matters for investigation. Certainly, we cannot simply assume that they are non-extensional.49 49 Those who are made uncomfortable by talk of intensional objects can interpret my proposal simply as a methodological principle (or even, less kindly, as a terminological one). It amounts to no more than the claim that the objects of interest in a particular field of inquiry should be separated from the way in which they are presented. If it should happen that we are interested in a domain O of objects which comprises the members of another domain O' together with their modes of presentation 7£, we should be especially careful not to confuse the members of 71 with the manner in which we represent the members of O. I realize that, even interpreted in this way, the principle is not philosophically neutral. In fact, it contradicts a famous view of the foundations of mathematics. Its rejection, however, seems to lead almost inevitably to obscurity and confusion. For what it is worth, I think that even writers from the intuitionistic standpoint adopt it de facto. Because it violates their principles, however, they do so neither very explicitly nor always consistently.
Interpretations of Derivations
177
There are immediate benefits to be obtained from adopting this proposal. (i) There is a gain in clarity. Viewed in the way suggested above, the formal study of proofs takes place within a well-developed conceptual framework, that of model theory. We are interested in models of certain formal systems whose terms are derivations. Our ideas about their intended interpretations are, perhaps, not as precise as we would like. But, as I argued in the introduction, we may hope to clarify them by investigating what interpretations they will in fact admit. (2) I remarked at the beginning of this chapter on the possibility that the analogy between derivations and A-terms might be incompatible with that between derivations representing the same proof and sentences having the same meaning. It clearly is if meanings are supposed to be intensional. From the perspective of the A-calculus, the appropriate comparison is not with sentences and their meanings, but with terms and their denotations. In other words, the view of the relationship between derivations and proofs taken above is forced on us if we want to take seriously the analogy between derivations and terms. This analogy, however, has been the mainstay of the subject as developed by Prawitz et a/., and there is no reason to suppose that its usefulness—as a source of results, for example—has been exhausted yet. (3) I think it is regrettable that the interest of a general theory of proofs for classical mathematics has seldom been emphasized. The fact that proofs play a special role in mathematics conceived intuitionistically, as part of the subject matter of ordinary mathematical assertions, should not blind us to their importance for mathematics on any conception of the subject. As I remarked earlier, the claim that classical mathematicians are interested only in consequence, and not in the proofs themselves, does not do justice to the facts. Questions concerning the identity of proofs, for example, or their constructive content are no less interesting from a classical, than from any other point of view. On the other hand, intensions have no role in classical mathematics. It seems to me, therefore, that there ought to be a classical theory of proofs, and that the distinction drawn above between a derivation's denotation and its manner of denoting provides a convenient way of differentiating classical from intuitionistic approaches to the subject. (4) The distinction is also useful for resolving, or at least clarifying, the nature of the disagreement between Prawitz and Feferman alluded to earlier.
178
Normalization, Cut-Elimination and the Theory of Proofs
On the one hand, the view that TV derivations adequately represent proofs, their similarity to the terms of certain calculi and the interpretation of the convertibility relation in these calculi combine to make it almost impossible to deny that interreducible derivations represent the same proof. On the other, such derivations are clearly not just linguistic variants of one another; there can be no doubt that information is lost in the process of reduction. The only apparent way to reconcile these facts is to classify the information lost as relevant not to the proof itself but to its mode of presentation. In other words, although interreducible derivations describe the same proof, they may do so in different ways (just as "the author of Waverly" and "the author of Waverly and Kenilworth" describe the same individual, although information is lost when the first replaces the second). This is not to deny the possibility of other conceptions of the subject, according to which distinctions between proofs are expressed by differences between such derivations; they may well be necessary for some purposes. Yet, I want to claim more than that a coherent and interesting notion is also arrived at by identifying the denotations of interreducible derivations. It seems to me that the ideas and methods currently employed in the general theory of proofs—the emphasis on strong normalization, the comparison of proofs with functions, etc.,—presuppose a notion of this kind. (This seems to be Prawitz's point too, when he remarks that the real issue is whether a proof can indeed be identified with "the result of applying certain operations to obtain a certain end result.") If it should turn out to be inadequate, new insights and techniques will be needed to study its replacement. (5) Finally, I think the distinction between a derivation's denotation and its way of denoting helps to clarify the status of permutative reductions. The idea that any reduction step preserves the way of denoting is, in my opinion, wholly implausible. Recall that, given any derivation in N or L, it is easy to construct another one with the same normal form which is arbitrarily complex (on any measure of complexity). Although I do not subscribe to the view that meanings are psychological, it does seem to me that psychological criteria can play a role in evaluating a theory of meaning. For example, if two expressions present an object in the same way, anyone familiar with the conventions governing their use should be able to recognize this fact immediately. By the above, however, it is far from obvious in general when two derivations reduce to the same normal form. (In my opinion, this argument counts also against Martin-L6f's conception of definitional equality.) As for permutative reductions in particular, they alter those features of a derivation which, according to Kreisel at least, are paradigms of intensional ones. Once the idea in question is abandoned, it is possible to look at permutative reductions in a new light. Viewed in the abstract, there is little to
Interpretations of Derivations
179
distinguish one permutation of inference from another. As I observed earlier, it is difficult even to imagine what kind of evidence would legitimate some of these whilst ruling out others. The obvious conclusion to be drawn is that, given any property of proofs, it is either preserved by all such permutations or by none. The circumstances described in (4) above, as well as the interpretations of derivations discussed below, would seem to favor the first of these alternatives. Unfortunately however arbitrary permutations can alter radically the structure of a derivation, which implies that much of this structure cannot correspond to features of the proof being described. Its importance, on the other hand, is undeniable and not explicable simply as a matter of syntax. The position being advocated, then, is that permuting any pair of inferences in a derivation leaves its denotation unchanged, but alters the manner in which this denotation is presented. This does justice to the importance of the structure of a derivation, whilst removing the need to perform the seemingly impossible task of judging between permutative reductions. I do not mean to imply that there are no grounds for distinguishing between different sets of permutative reductions. For certain applications it may obviously be necessary to restrict attention to a group of them which is adequate (in the sense that it allows every derivation to be reduced to a normal form), yet has certain desirable properties; it ensures that normal forms are unique, for example, or that every reduction sequence must terminate. But these distinctions are based on considerations of technical expediency. It is not surprising therefore that they should be made differently in different formal contexts, and unlikely that any profound consequences can be drawn from their being so. One of the virtues of the multiple-conclusion approach to these matters is that it reveals very clearly both the arbitrariness of the restrictions placed upon permutative reductions in the conventional reduction procedures (in particular, how they are motivated by syntactic features of the calculus concerned) and the wide variety of choices available when it comes to selecting such restrictions. The preceding provides, I think, a reasonable explanation of this situation.
The views expressed above, especially in (5), have an apparently disturbing implication. It is that in some respects proofs might be better represented by derivations whose inferences cannot be permuted. A single such derivation could then be associated with each group of N or L derivations whose members differed from one another only in the order of their inferences. As I argued at the end of Chapter 4, however, the possibilities for a representation of this kind seem rather limited. The most promising appears to be a calculus of the sort mentioned there in which the conclusions of a rule need not be connected to its premises; but it re-
180
Normalization, Cut-Elimination and the Theory of Proofs
mains to be seen whether this possibility can be realized.50 One objection raised against this sort of calculus in Chapter 4—that the representation it provides is not uniform because the relationship between the premise and conclusions of rule (4) is treated differently from that between the premises and conclusions of the other rules—could perhaps be overcome by making more widespread changes in the structural effect of applying these rules. Such an expedient would only serve however to reinforce the other objection, namely, that a calculus of this kind is less than ideal for representing the actual process of reasoning. That this should be so is, in my opinion, simply a reflection of the difference between a proof as conceived above and a particular piece of reasoning. The latter is properly regarded as a mode of presenting the former. My proposal now is that this difference can be regarded as a special case of the distinction between the grounds for an assertion and the procedure by which they are established. Expressed in these terms the distinction is a familiar one—albeit one which, because it corresponds to the distinction between truth and proof, is usually held not to apply to the present situation. This simple dichotomy is, however, misleading. It seems to me that the ambiguity inherent in the notion of justification is more accurately conveyed by a whole spectrum of possible meanings than by a pair of clear-cut alternatives. At one end of this spectrum is the view, espoused by Frege amongst others, that all justifiable assertions have the same justification, namely, they denote the True. At the other is Brouwer's conception of a justification as a singular process that takes place at a particular point in time; of course, an extrapolation from or description of such a process may also be called a justification, but this is clearly a secondary meaning which applies to something only insofar as it can serve to produce a justification in the primary sense. Justification in Frege's sense is certainly general enough, but this generality is purchased at the expense of informativeness; exactly the reverse could be said of Brouwer's conception. Our general philosophical perspective may determine how wide this spectrum appears to be, and may blind us to certain parts of its range. I am however more concerned with the other factors which influence where along this spectrum an acceptable justification is thought to be located. One such factor—perhaps the decisive one—is the nature of the assertion to be justified. As a rule of thumb, the distance from the Fregean end seems to be inversely proportional to the degree of obviousness and accessibility of the procedure by which an assertion of the kind in question is established. So, for example, a claim to the effect that its grounds can 50 T h e project of characterizing the equivalence relation generated by permutations of inference is of interest even if one rejects the thesis that interreducible derivations represent the same proof. Whatever view is taken of the proper reduction steps, if order of inference has some real significance for proofs, I do not see how either N or L derivations can be said to represent them adequately.
I n t e r p r e t a t i o n s of Derivations
181
be established is usually sufficient to justify an assertion about observable events close at hand. If there are obstacles in the way of observing an event, the justification of assertions about it may require additional information indicating how their grounds were established. To describe a justification in this way, however, is also to interpret it. For example, the grounds for asserting that it is 70° and sunny in Athens are certain meteorological conditions at a particular time and place, and the weather report in The Times provides a means of ascertaining whether these conditions do in fact obtain; but, if for some reason I am not particularly interested in Athens itself and have made the assertion on the basis of the weather report, it might be more natural to describe the report itself as the grounds for my assertion. Of more interest in the present context are mathematical assertions. Here, the procedures by which they can be established are well known, but relatively inaccessible. As a result, they normally require a more elaborate justification. (A simple "factual" claim may still suffice, however, when these procedures are accessible—e.g., in the case of a particularly obvious assertion, or one intended to convince only specialists.) Here too there is a question as to how such a justification is to be characterized. The difference is only that in the present case the question is taken more seriously. (After all, the view that Athens exists only insofar as it is reported on in newspapers is currently out of favor even amongst philosophers; its mathematical analogue, however, is actively discussed.) It may be helpful to compare a mathematical assertion to an empirical one pertaining to a state of affairs which cannot be investigated directly. For example, on a foggy night the location of a sandbank may be inferred from the sound of a bell or a siren, the sight of a warning buoy or light, from depth soundings or from any combination of these. According to some mariners, the sandbank itself, a particular configuration of matter, is distinct from all of the above; they only constitute ways by which we may come to know about it, albeit indirectly. Under other circumstances, we might hope to see the bank directly and comprehend all its properties including its location. 51 According to others, there is no such configuration of matter. To assert the existence of a sandbank at some location is simply to make a statement about the possibilities of undergoing certain kinds of experiences (seeing buoys and hearing sirens, for example, in a particular place). The sandbank itself is one of those "noxious ornaments, beautiful in 51 This somewhat heavy handed extension of the analogy is intended to provide a metaphorical account of Plato's conception of mathematics. Mathematical objects enjoy an independent existence and under the right circumstances, with the appropriate effort and philosophical training, we may hope to know them directly (i.e., when the fog lifts and the sun comes up we may hope to see them). In the meantime, we must learn about them with the aid of objects we do know (can see) which reflect their properties. See, for example, Republic, 510e.
182
Normalization, Cut-Elimination and the Theory of Proofs
form, but hollow in substance/' which must be excised from our ontology; any risks incurred in so doing will be "at least partly compensated for by the charm of subtle distinctions and witty methods" by which it enriches our thought. 52 My purpose in mentioning all this is not to adjudicate between different views, but to illustrate how many different distinctions can be drawn when it comes to a matter of justification. There are at least the following: I.
the particular configuration of sand and its location [mathematical objects and facts about them] II. evidence for I (for example, sirens and lights, or bells and buoys) [proofs] III. particular presentations of II (for example, the sighting of a buoy followed by the sound of a bell) [modes of presenting proofs] IV. linguistic expressions which describe II in terms of III [derivations] V. a particular experience (for example, seeing the buoy and hearing the bell) [the grasping of a proof]. As I have already indicated, there are those who would dispute the significance of some of the above. Furthermore, the way in which I have chosen to describe them is not neutral. (For example, to characterize the activity V as the grasping of a proof is to give II a priority or, at least, an independence which will offend those for whom II is derivative upon V.) Nevertheless, this list does serve to indicate where proofs, conceived of as the denotations of formal derivations, can be located in the scheme of things. Of course, I have not provided an analysis of this notion of proof, nor can I do so here, but I hope that I have done enough to make the idea of distinguishing between proofs and reasoning at least seem coherent. 53 The ideas discussed in (l)-(5) above are of a rather general and speculative kind, although some do contribute to the solution of quite specific problems. As a result, their value depends to some extent on their ability 52
A. Heyting, Intuitionism, 3rd edition, Amsterdam, 1971, page 11. I realize that the view I have espoused is an unconventional one, violating both linguistic and philosophical orthodoxy, but I do think it has much to recommend it. For example, I remarked earlier that mathematicians are said to be interested in questions about the identity of proofs. It is doubtful, however, whether they are as interested in such questions in the abstract as they are in new proofs, even of known results. Whatever "new proof" means in this context, it surely is something more than "obtained from the old one by the kind of structural operations on derivations considered above." (Anyone who attempted to publish a proof which was "new" in this sense would invite ridicule.) On the other hand, a determinate structure is an integral part of an argument or a piece of reasoning. There seems to be no other way of reconciling these facts than by distinguishing proofs from reasoning. 53
Interpretations of Derivations
183
to interact fruitfully with more formal and specialized work in logic and mathematics. By a fruitful interaction, I mean both that there should exist formal results which support them and which acquire significance when interpreted in terms of them, and that they should suggest new lines of inquiry. Whether such interaction is in fact possible remains to be seen. Even at this stage, however, it seems to me that there are interesting connections between these ideas and some formal work. I would like to conclude by mentioning two examples which, in my opinion, illustrate this fact. (i) The analogy between derivations and terms, coupled with the idea of looking for models which the former will admit, leads naturally to a consideration of the full type-structure over some domain. Harvey Friedman has considered structures of this kind and characterized the identity relation between their elements in terms of convertibility (or provable equality) in a version of the typed A-calculus.54 The version in question includes— in addition to axioms and rules for identity, an axiom for A-conversion and an axiom which allows changes of bound variables—an extensionality principle: ((Xx)(sx)) = s if x is not among the free variables of s. After defining a notion of structure appropriate to the typed A-calculus, he prove its soundness—showing by induction on the length of the derivation of s = t that , if h s = t, then M f= s = t for all structures M. He then constructs a particular Mo whose elements are terms of the calculus factored out by the equivalence relation of being provably equal and shows that it is a structure in his sense. Once this has been established, it follows almost immediately that, if M 0 |= s = t, h s = t. (Friedman refers to this as a completeness theorem for the typed A-calculus because, when taken together with the soundness result quoted earlier, it establishes the equivalence of the following three conditions: (i) h s = t; (ii) for all structures M, M (= s = t, and (Hi) Mo \= s = t.) Friedman goes on to establish what he calls an extended completeness theorem for the typed A-calculus. He first defines a notion of partial homomorphism between structures which has the property that, if there exists a partial homomorphism from M onto M and M (= s = t, then Af (= s = t. Given any set B, T#—the full type-structure over B—is the paradigm of a structure for the typed A-calculus. Friedman proves that, for any infinite 54 "Equality Between Functional," Springer Lecture Notes in Mathematics, Vol. 453, 1975, pages 22-37. Actually this paper deals with a wider range of topics than the above description indicates. Friedman was interested in classifying the relations of equality and being everywhere different between functionals of various classes according to their complexity. The completeness theorem quoted in the text is an intermediary towards showing that equality between simple functionals (i.e., those in the full type-structure over to which are defined by a closed term of the typed A-calculus) is recursive.
184
Normalization, Cut-Elimination and the Theory of Proofs
set B, there is a partial homomorphism from TB onto Mo* This allows him to conclude that TB f= s = £ is equivalent to any of (i)-(iii) above. Equality in the structure TB is of course set-theoretical equality between functions. Translated into the language of derivations, the extended completeness theorem asserts that, when closed derivations in the pure implicational fragment of NJ are interpreted as denoting functionals in the set-theoretic sense over some (infinite) domain of atomic proofs, II and IT are interreducible (using Prawitz's reductions steps augmented by expansions) iff they denote the same functional. I think this is an interesting statement; it provides an illuminating characterization of interreducibility, albeit for a restricted class of derivations, and it demonstrates that proofs need not disappear into the consequence relation when they are interpreted as extensional objects. In addition, it suggests some interesting questions for further investigation. For example: (a) Can this result be extended to all of propositional NJ? It seems to be a routine matter to extend the theorem to the negative fragment of NJ by adding product types and modifying the notion of structure accordingly. If the type-structure is then extended to take account of disjunction, it seems clear that all permutations of inference will preserve equality. A more problematic issue is whether the converse holds, z.e., whether it will still be true that derivations which denote the same functional are interreducible. The difficulty lies not so much with disjunction itself as with the thinning that accompanies it. It is unclear how best to treat this latter operation in a functional context. (b) Can it be extended to NIC! Multiple-conclusion logic suggests a natural generalization of the notion of function and of the type-structure which may provide an appropriate framework within which to tackle this question. (c) What is the significance of the expansions? In view of Prawitz's remark quoted earlier, it may be that expansions are of no particular interest, and the need for them here is neither significant nor disturbing. On the other hand, it may be worth considering the possibility of obtaining the same result for the interreducibility relation generated by the reduction steps alone. This could perhaps be accomplished by interpreting derivations not as particular proofs, but as proof patterns, z.e., by regarding their minimal formulae as variables and identifying derivations which differed only with respect to these. Derivations might then be interpreted as denoting (untyped) partial functions—the difference between n and IT, when the latter is obtainable from the former by expan-
Interpretations of Derivations
185
sion, being reflected in a difference in their domains of definition ( Dom(W) C Dom(U) ) . (2) Another possible interpretation for derivations is as the morphisms of a category. Again this might provide evidence for the choice of reduction steps—in particular, for whether there is any reason beyond expediency to restrict permutations. The first obstacle to be overcome on this approach is the need to find a suitable generalization of the notion of category which will accommodate morphisms with a series of domains and, if LK derivations are to be interpreted, a series of ranges as well. In Appendix C below, I have sketched very briefly some work in this direction. I have also outlined what I think is a natural generalization of categories to morphisms with more than one domain and range, suggested by the multiple-conclusion calculus considered earlier. It then turns out that the structural axioms for these generalized categories force us to identify derivations which permute to one another—thus providing some additional support for (5) above. Zucker's non-terminating reduction sequence appears in this context as an innocuous example of an infinite series of terms all of which refer to the same morphism.
The conclusions I have reached may seem surprising at first sight but there is, I believe, much to recommend them. On quite general grounds, the conception of proofs as rather loosely structured objects is a plausible one. Once it is accepted and the denotations of formal derivations are viewed in terms of it, a unified treatment of cut-elimination and normalization becomes a possibility and the general theory of proofs is freed to some extent from the shackles of syntax. Although it is impossible to predict with any certainty how the subject will develop in the future, the direction I have indicated does show some promise of being a fruitful one—or so, at least, I have tried to argue.
Appendix A
A Strong Cut-Elimination Theorem for LJ This appendix discusses the possibility of avoiding Zucker's negative result about strong cut-elimination for LJ by altering his conventions for the indexing of formulae. As I remarked in Chapter 2 above, his particular counterexample to strong cut-elimination depends upon a special, and perhaps rather unnatural, feature of these conventions. The question that remains is whether an alternative indexing system could avoid such counterexamples altogether. The version of LJ presented below is essentially the one adopted by Zucker in his paper. 1 A sequent has the form T H A, where T is a set of indexed formulae; negation is defined in terms of a constant _L for falsity, and there is no thinning rule. I have however altered the conventions which govern the indexing of formulae and, as a result, contraction becomes a derived rather than a primitive rule. My desire to follow Zucker's treatment as closely as possible explains the unusual formulations of V- and 3-left. As before, F, A , . . . are supposed to range over sets of indexed formulae, and z, j , k,... over indices. I will write T, A for TUA and T, Ai for r u { ^ i } ; this notation is not intended to imply either that m A = 0 or that Ai £ T. When I do want to indicate this, I will use T; A and T;Aj, respectively. Finally, T;(Ai) will be used to denote ambiguously T;Ai and T. (In the latter case, it is assumed that Ai & T.) Similarly, T,(Ai) will be used to denote T, A\ and T, when it is left open whether or not A% G T. The notations T; (A) and T, (A) are explained in an analogous way. I take the Calculus LJ to consist of the following. Axioms: Ai h A
±i\- P (P atomic and different from _L)
lu O n the Correspondence between Cut-Elimination and Normalization," Annals of Mathematical Logic, Vol. 7, 1974, pages 1-156.
186
A Strong Cut-Elimination Theorem for LJ Logical Rules: Right
187
Left
ri- A A h B r,AI-^AJ5
T;(Aj)\-C r,(AABj)\-C
T;(Bi)\-C Y,{AABj)\-C
T\-A T\-B TbAVB ri-AVB T; (A) h B Th A^B
T;{Ai)\-C A;(g,)hC T,A,AvBk\-C Tt- A A; (Bj) \- C (r),A,(A->Bj)hC
T h A{a) * T h VxA(x)
r;(^)Qi-B r.O/a;^!^)!-^
r i- A{t)
r;(^(q)8)hg f T, 3xA{x)j h S
T I- 3xA(x)
* where a does not occur in Y \ where a does not occur in T, B. Cut Rule: r\-A
(At);A\-B (T),A\-B
The application of cut, or of a left rule other than V- or 3-left, to a premise which contains no formula occurrence involved in the inference is empty. This means in the case of cut that, if A; (Ai) = A, then d T\-A
d' A;{AZ)\-B (r),AhB
is just another notation for the derivation d!. Empty applications of A-, —*-, V- and 3-left are treated similarly. Formulating these rules so as to take account of empty applications is just a convenient way of introducing some notation which will be useful later on. The use of sets in place of sequences of formulae makes a rule of interchange redundant. As for contraction, in this calculus it takes the form r,AjY- B and is not included among the basic rules because of the following:
188
Normalization, Cut-Elimination and the Theory of Proofs
Lemma A . l If d is a derivation ofT\Ai h B, then there is a derivation d(Ajfi) o/T, Aj h B which differs from d only in that some formula occurrences are assigned different indices. (In particular, there are no cuts in d(Aj/i) which are not already in d.) This lemma is proved by a straightforward induction on the construction ofd. Reduction steps are of three kinds: A. Elimination of trivial cuts B. Permuting cuts upwards C. Reducing the complexity of cuts. The following reduction steps are to be read from left to right. A. Elimination of trivial cuts: a.
d r hA
Ai\-A
d T\- A
r\-A b.
d Ai\- A T; Aj 1- B r,At\-B
diAi/j) T,Ai\-B
B. Permuting cuts upwards: These reductions divide into two groups according to whether the cutformula is passive on the right or on the left. (1)
Cut-formula a.
d
ei-,4
passive on the right d\ d
r;(4)KBA;(4)hC F,A;Ak\BAC e,r,Ah5AC d d2 Q\- A A;(Ak) VC (6) , A I - C (e),ri-B e.r.AhfiAC
d 6 h i
b. d T\-A
d\ T]{Ak)hB
d' A;Ak\-B A; Ak\-B'
A.rhs' where R is V- or 3-right.
d T\-A A,T\-
d' A;Aky-B B
A,ri-s'
A Strong Cut-Elimination Theorem for LJ c d F\-A
d' A; Ap; (Bq) V- C A;AP\-B^C r.AKB^C
189
d d'(Bm/q) T \-A A;ip;(Bm)hC r,A;(5m)hC r.AhB^C
where m does not occur as an index in d or d'. d. d D-B
d'(a) A; Bfc h A(a) A;BkhVxA(x) T, A I- Va;A(x)
d d'{b) T \-B A;Bk\-A{b) r , A h 4(6) r , A h V«i4(i)
where 6 is a parameter not occurring in d or d'(a). e. d Y\-D
di d-i A;(Dn);(Ap)hC 8;(D„);(B q ) h C A,9,iVBt;PBhC T,A,e,AVBk\-C d di(AT/p) r h - D A;(Dn);(Ar)\-C A,(r);WhC
d ThD
d2(Bs/q) 9;(D„);(B,)hC e,(r);(B,)hC
r,A,e,i4vBfchc where r, s do not occur in d, d\ or ^2. f.
d' d A;Ap;Bq\-C F\-B A,A'k;Bq\-CR A,r,4HC
d Th B
d'(Am/p) A;Am;BqhC A,T-Am\-C A,T,A'khCR
where m occurs nowhere in d or d', and R is A- or V-left. g. d e\-C
di d2 T;{Cm)\-A A;(Cm);Bk\-D T,A,A~>Bp;Cm^D r,A,e,A-+Bp\D d S\-C
dx d d2(Bq/k) r ; ( C m ) h ^ @\-C A;(Cm);Bq\-D r,(8)hi4 A,(@);Bq\-D T, A, 9 , A - • Bp h D
where q occurs nowhere in d or d2.
Normalization, Cut-Elimination and the Theory of Proofs h.
d'(a) d A;Ap;(B(a)q)\-C Tt-A A,3xB(x)r;Ap\-C A,3xB(x)T,r\-C
d FhA
d'(b)(B(b)s/r) A;Ap;(B(b)s)i-C A,T;(B(b)s)\-C A,3xB{x)r,V\-C
where 5 occurs nowhere in d or d'(b), and b is a parameter not occurring in d or d'(a). Cut-formula passive on the left a.
d\ d2 T;(AP)\-C A;(Bq)\-C T,A,AVBkl-C r,A,AVBk,Q\-D
d @;Cm\-D
d d2{Bs/q) d di(AT/p) r;(Ar)hC e;Cm\-D A;(BS)\-C 9 ; Cm \- D Q,T;(AT)\-D e, A; (Bs) r- D T,A,A\/ Bk,@\- D where r, s do not occur in d, d\ or d2. b.
d(Ak/p) A;Ak\-B A,T;Ak\-C
d A;AP\-B d' A,A'a^BR Bn;ThC A,A'q,T\-C
d! Bn;T\-C
A,A'q,r\-cR
where k occurs nowhere in d or d' and R is A- or V-left. c.
d\ d2 T\- A A;Bk\-C T,A,A^ BpbC r,A,A^Bp,
di
d Q;Cm\-D
e\-D d2(Bq/k) A-Bq\-C
d " ; Cm \-D
T\-A
r,A,A^Bp,et-D where q occurs nowhere in d or d2. d.
d(a) r ; (A(a)p) \- B d' r,3xA(x)q\-B A;Bm\-C A,3xA{x)q,T hC
d(b)(A(b)T/p) d' r ; (i(i) r )hg A;Bm\-C A,T; (A(b)r) h- C A,3xA(x)q,T \- C
where r occurs nowhere in d(b) or d', and b is a parameter not occurring in d' or d(a).
A Strong Cut-Elimination Theorem for LJ
191
C. Reducing the complexity of cuts: (1)
a.
di di YV A A\-B r,A\-AAB
d e;{AABk);Ap\-C 9;AABk\-C
e,r,A\-c dt T\- A
di d2 r\~A A\-B d(Aq/p) T,A\-AAB Q;(AABk);Ag\-C Q,(T,A);Aq\-C
e,r,(A)i-c where q does not occur in d, d\ or di. b. Like (la) with Bv instead of Av and 0*2 playing the role of d\. (2)
a.
d &^A Oh AVB
d\ d2 r;(AVBk);(Ap)\-C A;(AV Bk);(Bq)\-C r , A; A V Bk H C
r,A,ei-c d d
Q\-AVB
e\-A
r;(AVBk);(Aq)\-C
r,(ey,(Aq)\-c r, (e)»- c
where q occurs nowhere in d or d\. b. Like (2a) with B instead of A, and d2 playing the role of d\. (3)
d r ; (Ap) h B r\-A- ->B d T-AAP) \-B r\-A- -*B
d\ {A^Bk);A\-A
0Z2 (A -> Bk);G;Br
hC
A^Bk;A,Q\-C r,A,0(-C7 di
(A->Bk);A\-A ,A\-A T-{AP)\-B (r) T,{A)\-B / / T,(A)hB
T;(AP)\-B d2(Bq/r) T\-A-> B (A^Bk);@;Bq (V).,e;B,xvc r,(A),6hC
where q occur s nowhere in d, di or aV
\-c
192 (4)
Normalization, Cut-Elimination and the Theory of Proofs d(a) T h A(a) n-VxA(g)
d' A;(Vx(Ax)p);A(t)q\&;VxA{x)phB &,T\-B d(a) r I- A(a) T h VJA(X)
d(t)
r i- ACQ
B
d'(A(t)r/q) A; (VaA(g)p); i4(<), I- ff
A,(r) ; (i(t) r )hfi A,ri-B
where r occurs nowhere in d(a) or d'. (5)
d r h i4(Q r h 3xA(x)
d'(a) A; (3a;X(a;),); (4(a) p ) I- B A; 3ar^(x) g h g A,T\- B
d r I- i4(t)
d r i- i4(t) (fW^Wr/p) r h 3a:A(x) A; {3xA{x)q); {A(t)r) \- B A,(r);(/l(<)f)r5 A,(r)i-5
where r occurs nowhere in d or c?'(a). Remarks: (i) The reductions in group B are easy to describe. They simply allow a cut to be permuted upwards past applications of any other rule provided that the cut formula is passive. Unfortunately, there are a large number of cases to consider and I have not been able to find a notation that enables more than a couple of these to be amalgamated. No restrictions are placed on these permutations except to ensure that there are no clashes of indices, i.e., to ensure that a formula occurrence which is passive does not become active as a result of performing one of these reductions. This explains the need to reindex formulae in some of them. (In (Blc), for example, Bq might occur in T. The active occurrence of of Bq in d! must therefore be converted into B m , where m is some new index, before permuting the cut with —•right.) Such clashes are undesirable in principle since these reductions are supposed to be simple permutations and nothing more. In addition, they pose problems for strong cut-elimination.2 2
See the discussion of Gentzen's mix rule in Section 7.6 of Zucker, op. cit.
A Strong Cut-Elimination Theorem for LJ
193
(2) The statement of the reduction rules in group C could be simplified if the left rules were reformulated in such a way that the formula introduced had to be assigned a new index. Contraction would then no longer be a derived rule, but could be added as a basic one. This in turn would necessitate adding a further group of reductions for permuting contractions downwards past other inferences, and would result in a slightly different set of choices for reduction procedures. Such a modification would not be of much significance, but there is little incentive to make it. It probably complicates matters more than it simplifies them and, in addition, the new restriction on indices is difficult to motivate. What cannot be required is that subderivations of a given derivation have no index in common—for example, that T and A be disjoint in the statement of A-right. Zucker's example of a non-terminating (proper) reduction sequence depends upon just this point. (3) If d terminates with a cut and d! is obtained from d by applying one of the above reductions to its final inference, then d! is said to come from d by a 'primitive reduction. In general, we say that d reduces (in one step) to d! if d' can be obtained from d by replacing one of its subderivations by a derivation which comes from it by a primitive reduction. When the primitive reduction in question is from group C, the conclusion of the new subderivation need not coincide with that of the subderivation from which it comes, with the result that some members of the succeeding chain of inferences may become empty. The notation introduced earlier is intended to make clear how such inferences are to be treated. It obviates the need to give a separate definition of what Zucker has called pruning, 3 and has the effect of eliminating all inferences made redundant by the reduction except for applications of V- and 3-left. This result is easily seen to be the same as would be obtained by adapting Zucker's definition to the present context. In according special treatment to redundant applications of V- and 3-left I am following Zucker's example. His only motivation for doing so seems to be that it facilitates the comparison with normalization procedures for natural deduction derivations.
It has been shown by Dragalin that every reduction sequence constructed according to A, B and C above must be finite in length. 4 In 3
Zucker, op. cit, pages 44-47. "A strong theorem on normalization of derivations in Gentzen's sequent calculus," Studies in the theory of algorithms and mathematical logic, ed. by A. A. Markov and V. I. Khomich, "Nauka," Moscow, 1979, pp. 26-39 (Russian). An English translation of the proof appears as Appendix B of his monograph Mathematical Intuitionism: introduction 4
194
Normalization, Cut-Elimination and the Theory of Proofs
fact, he establishes this result for a version of LK, not just for LJ. There are a number of differences between his version of the sequent calculus and the system described above, most of them rather minor. In the first place, his sequents are constructed from lists of formulae, where a list is explained as a finite set with repetitions so that, although a formula may have more than one occurrence in a list, the order of formula occurrences does not matter. In the second place, his rules include thinning and contraction as well as rules for negation (rather than axioms for ±). Finally, his calculus employs a mix rule instead of cut, i.e., a rule which removes every occurrence of the cut formula from a list. As for his reduction steps, they do not coincide exactly with those listed above. They include, of course, the reductions needed for contraction, thinning and the negation rules. In addition, however, each reduction listed under C above is replaced by a pair of reductions whose applicability depends upon whether the cut formula introduced into each premise of the cut by the preceding inference already occurs in the premise(s) of that inference or not. To handle the former case, there are reductions which allow the cut to be applied before the inference in question to remove these prior occurrences, and then reapplied after it to remove the new occurrence of the cut formula. In the latter case, the familiar steps for replacing the cut by a cut or cuts of lower degree are used. This results in a slightly more flexible reduction procedure. Although these various differences affect some details of the proof, none of them is of much significance. In my account, I have tried to follow Zucker as closely as possible, except for the reductions in C which have been chosen to simplify the exposition. Dragalin's proof is suggested by Prawitz's proof of the analogous result for natural deduction. An inductively defined property of derivations (an analogue of "strong validity" or "computability") is introduced such that derivations with this property are easily shown to generate only reduction sequences of finite length. The work of the proof consists in establishing—by induction on the definition (amongst other things)—that all derivations have the property. Call the derivation(s) of the premise(s) of the final inference of a derivation d its immediate subderivation(s). Definition A . l (1) A derivation d is said to be inductive if a. d is an axiom. b. the last inference of d is not cut and the immediate subderivation(s) of d is (are) inductive. c. the last inference of d is cut and every derivation to which d reduces in one step is inductive. (2) The inductive complexity of d is defined as follows: to proof theory, Vol. 67 of the AMS series "Translations of Mathematical Monographs," Providence, 1988, pp. 185-200.
A Strong Cut-Elimination Theorem for LJ
195
a. if d is an axiom it has inductive complexity 1. b. if the last inference of d is not cut, its inductive complexity is one more than the inductive complexity (sum of the inductive complexities) of its immediate subderivation(s). c. if the last inference of d is cut, then its inductive complexity is one more than the sum of the inductive complexities of the derivations to which d reduces in one step. It is now a straightforward matter to prove by induction on inductive complexity that there is no infinite reduction sequence beginning with an inductive derivation d. If d is an axiom, there is nothing to prove. If d terminates with an inference other than cut, it follows from the elementary properties of the reduction steps that such a sequence must contain an infinite subsequence of reductions applied to (one of) the immediate subderivation(s) of d—contrary to the induction hypothesis. Finally, if d terminates with a cut, the result follows immediately from the induction hypothesis. It remains to argue that every derivation is inductive. This too is a straightforward induction—here, on the construction of d—provided that the derivation which results from applying cut to the conclusions of a pair of inductive derivations can itself be shown to be inductive. Dragalin calls such a derivation, i.e., one terminating with a cut whose immediate subderivations are both inductive, a regular figure. So, for strong cut-elimination, it is enough to show: Lemma A.2 Every regular figure d is inductive. Lemma A.2 is proved by induction on the pair (a, (3), where a is the degree (or logical complexity) of the cut formula of the terminal cut of d and (3 is the sum of the inductive complexities of its immediate subderivations. The proof is a matter of verifying that every derivation d! to which d reduces in one step is inductive—using the induction hypothesis as needed and the fact that the result of applying a reduction step to an inductive derivation is itself an inductive derivation of lower inductive complexity. Let us designate the final inference of d by C, and its right immediate subderivation by d". There are a number of cases to consider: (i) d' comes from d by reducing a cut other than C. If C disappears as a result, it follows from the regularity of d that d! is inductive. If it does not, we argue that the inductive complexity of the immediate subderivation containing the reduced cut has been lowered and apply the induction hypothesis. (2) C is a trivial cut and d! comes from d by eliminating it with a type A reduction. Here, the inductiveness of d' follows immediately from the hypothesis that d is regular.
196
Normalization, Cut-Elimination and the Theory of Proofs
(3) The cut formula is passive in at least one premise of C and d! is obtained by using a type B reduction to permute C. In this case we argue first that the immediate subderivation(s) of d! terminating with C (or the pair of cuts into which C has been split by the reduction) is (are) regular and that one of its (their) immediate subderi vat ions is of lower inductive complexity than the corresponding immediate subderivation of d. Next we apply the induction hypothesis to infer the inductiveness of this (these) subderivation(s) and, finally, use the fact that d' is obtained from an inductive derivation (or two such) by a rule other than cut to infer that it too must be inductive. (4) The last case is that d! comes from d by applying a proper reduction to d. In this case, we argue first that the immediate subderivations of d' (or the subderivations obtained from df by removing the last two cuts in the case of (C3)) are inductive. This follows either by the same argument as in the preceding case or from the regularity of d—depending upon whether or not the cut formula of C is already present in the conclusion of an immediate subderivation of d". We can then appeal to the fact that the final cut of d! (or last two cuts in the case of (C3)) is (are) of lower degree than C and infer from the induction hypothesis that d' is inductive. From the argument outlined above we can conclude: Theorem A.3 (Dragalin) Every reduction sequence generated by the steps in A, B and C above is finite. Although this is a satisfying result, it falls short of what we would like to be able to claim. Dragalin rightly points out that the reduction steps he has selected are those employed by Gentzen in his original proof of cutelimination. On the other hand, Gentzen's procedure, according to which cuts are eliminated from top to bottom in a derivation,5 is often modified in such a way that cuts are systematically reduced by degrees, beginning with those of highest degree.6 This procedure may require a cut to be permuted upwards past other cuts of lower degree. Furthermore, Zucker allows cuts to be permuted freely with one another since, as he shows, such permutation variants will still be mapped onto the same natural deduction derivation and, hence, may plausibly be claimed to represent the same proof. There is no difficulty about formulating steps for permuting cuts with cuts. 5
See Chapter 1 above. See, for example, "Proof Theory: Some Applications of Cut-Elimination" by H. Schwichtenberg, Handbook of Mathematical Logic, ed. by J. Barwise, pp. 867-895. 6
A Strong Cut-Elimination Theorem for LJ
197
To the reductions in group (Bl), cut-formula passive on the right, we add the following: i. d T\-A
d\ d
A,r,0h£> d r h i
d\ A;(Afc)hC A,(r)hC
d T\-A
d2{Cq/p) Q;Cq;(Ak)\-D e,(T);Cq\-D
A,e,rh£> where q occurs nowhere in d or d2. and to those in (B2), cut-formula passive on the left, e.
d\ di TV A A^AVB d T,Ah£ Bq,
2{Ar/p) d Ar;A\- B Bq;Q\- D Ar;A,e\-D r,A,9h£) where r occurs nowhere in d or d2. di TV A
The only problem is that the inclusion of (Bli) and (B2e) among the reduction steps obviously opens up the possibility of infinite reduction sequences: they allow successive cuts to be permuted with one another ad infinitum. All is not lost, however, since we may hope to salvage the result by utilizing Zucker's notion of a proper reduction sequence, i.e., one without infinite repetitions. Although (B2e) may increase the number of cuts in a derivation, no proper infinite sequences can be generated by permuting cuts with one another. This can be seen from the following considerations. Definition A.2 The power of a formula occurrence on the left of a sequent in a derivation is defined by induction on the rules: 7 (1) The formula occurrence on the left of an axiom has power 1. (2) The formulae on the left in the conclusion of an application of a right rule have the (sum of the) power(s) of their occurrence(s) in its premise(s). (3) The passive formulae in the conclusion of an application of A- or V-left have the same power as their occurrences in its premise. 7 Recall that occurrences of the same formula with different indices are counted as different formula occurrences.
198
(4)
(5)
(6)
(7)
Normalization, Cut-Elimination and the Theory of Proofs The active formula has the power of its occurrence (if any) in the premise plus the power of the formula occurrence from which it was obtained by the rule. The passive formulae in the conclusion of an application of 3-left have the same power as their occurrences in its premise. The active formula has the power of its occurrence (if any) in the premise plus one. The passive formulae in the conclusion of an application of V-left have the sum of the powers of their occurrences in its premises. The active formula has the sum of the powers its occurrences (if any) in the premises plus one. Let n be the power of the formula occurrence in the right premise of an application of —•-left which is operated on by the rule. The passive formulae in the conclusion of this application have the power of their occurrences in the right premise plus n times the power of their occurrences in the left premise. The active formula in the conclusion has the power of its occurrence (if any) in the right premise plus n times the power of its occurrence (if any) in the left premise plus n. Let n be the power of the occurrence of the cut formula in the right premise of an application of cut. The formulae in the conclusion of this application have the power of their occurrences in its right premise plus n times the power of their occurrences in its left premise.
Intuitively, the power of a formula occurring on the left of the conclusion of d is the cardinality of its corresponding assumption class in 4>{d). Definition A.3 The weight of an application of cut in a derivation d is defined as follows: (1) If the last inference of d is not cut, then each cut in d has the same weight as it does in the immediate subderivation of d in which it occurs. (2) If the last inference of d is cut and the cut formula in its right premise has power n, then the weights of the cuts in the right immediate subderivation of d are unchanged. Those in the left immediate subderivation have their weights multiplied by n, and the final cut in d is assigned n as its weight. Again, the motivation for this definition comes from the mapping
A Strong Cut-Elimination Theorem for LJ
199
(2) If the final cut of d is split by an application of (B2e), and with it all the cuts in the left immediate subderivation of d, then the weight of each such cut in d is equal to the sum of the weights of the two cuts which replace it. It follows from (2) that the weight of a cut provides a bound on the number of cuts into which it can be split by repeated applications of (B2e). Furthermore, (1) guarantees that, when cuts are split in a derivation, no cut occurring below them has its weight increased as a result. Hence the sum of the weights of the cuts occurring in a derivation fixes a bound for the derivation as a whole. It is obvious, however, that no infinite non-repeating sequence can be generated by applications of (Bli) and non-splitting applications of (B2e). We can conclude therefore: Lemma A.4 No infinite sequence of applications of (Blij or (B2e) is proper? This is in fact the only point in the discussion which depends upon the distinctive property of LJ. If more than one formula is allowed to occur on the right of a sequent, infinite non-repeating sequences are easily constructed—as the example in Chapter 7 above illustrates. Lemma A.4 does not imply that strong cut-elimination (in the sense that every proper reduction sequence is finite) holds when the reductions in A, B and C above are augmented by steps for permuting cuts with one another. In fact, it is not clear whether Dragalin's argument can be adapted to this new situation. On the other hand, Zucker's proof for the negative fragment, which allows for such permutations, depends upon translating cut-elimination steps for LJ into the familiar normalization steps for NJ. (B2a) and (B2d) above do not translate in this way however and, although his methods can be extended to take account of (B2d), 9 disjunction remains a problem. The upshot of this discussion is that strong cut-elimination does hold relative to the reduction steps described above, subject to certain restrictions. It seems likely that these restrictions—whether to fragments of LJ or on the permutation of cuts—can be removed, but at present this is just a conjecture.
8 This does not follow immediately from Zucker's results because his indexing system precludes the possibility of splitting up a cut when it is permuted upwards past another one. 9 See Chapter 3 above.
Appendix B
A Formulation of the Classical Sequent Calculus Axioms:
Ai \- Aj
Logical Rules: Right
Left T;Aj\-A T; Bj \- A r, A A Bk \- A T, A A Bk h- A
r,r'hA,A',iASfe T\-A;Aj T\-A,AVBk
r;(^)hA r;(gj)hA> T,r',AvBk\-A,A'
T h A; B, YhA,A\lBk
r\-A;Aj 5j;r'hA' r,r,A-£febA,A'
r;(Ai)\-Bj,A ThA^Bk,A T;Aj\-A r h -.Xjt, A
T\-Ai;A r^Ak\-A
rh^(a)i;A F\-VxA(x)k,A
*
r;A(t)j\-A T,VxA(x)k\-A
r\-A;A(t)j r h A, 3x,4(x)fc
r;(it(a)i)hA * r, 3x,4(:r)fe h A
* where a does not occur in T or A. (An alternative possibility is to replace the rules for -• by axioms of the form ±i\- Pj—where P is atomic and different from _L.) Cut Rule:
T\-A;Aj
^f'hA'
r,r'hA,A' 200
A Formulation of the Classical Sequent Calculus
201
Thinning Rules: Right rhA;Bfc
(a)
(b)
Left Bk]T\-A
r h A,Bj,Ai
4B;,rh A
Bk;T\-A Bj,Th- A,Ai
T\-A;Bk i j h A,Bj
Notation: The notational conventions are as before, in particular: • • • •
T, A , . . . range over sets of indexed formulae, and z, j , fc,... over indices. I will write T, A for T U A and I\ Ai for T U {At}; this notation is not intended to imply either that T fl A = 0 or that Ai 0 T. When I do want to indicate this, I will use T; A and T; Ai, respectively.
In the cut-elimination steps below, • T; (Ai) will be used to denote ambiguously T; Ai and I\ (In the latter case, it is assumed that Ai &T.) • Similarly T,(Ai) will be used to denote Tx when it is left open whether or not Ai 6 T. • The notations T; (A) and I\ (A) are explained in an analogous way. As before, we can prove Lemma B . l If d is a derivation ofT;Ai h A\Bj7 then there are derivah A;Bj and d(Bn/j) ofT;Ai h A , B n which differ tions (Akfi)d ofr,Ak from d only in that some formula occurrences are assigned different indices. (In particular, no cuts are introduced, and d, (Ak/i)d and d(Bn/j) all have the same length.) Proof. A routine induction on the length of d.
D
(Akji)d is said to be obtained from d by left contraction and d(Bn/j) by right contraction. Gentzen's formulation of LK, described in Chapter 1 above, differs from the version presented here in a number of respects. In particular, he uses sequences of unindexed formulae where I have used sets of indexed ones and, as a result, is obliged to introduce interchange rules which vary the order of their terms; also, the premises of his thinning rules contain no active formula. Even the reduction steps listed below do not coincide exactly with his. These various differences arise for the most part from a difference
202
Normalization, Cut-Elimination and the Theory of Proofs
of interest. It seems fair to say that Gentzen was concerned only about proving a normal-form theorem in a sufficiently elementary way. On the other hand, I am interested here in the relationship between the sequent calculus and natural deduction, and between a derivation and its cut-free forms. For this reason, I have tried to formulate the calculus so that the interpretation of its derivations as instructions for constructing natural deduction ones can be extended in a reasonably natural way from LJ to LK and from a calculus without thinning to one with. I have also tried to arrange matters in such a way that the transformation of a derivation into cut-free form preserves as much of its original structure as possible. These considerations help to motivate the cut-elimination steps which follow. A. Elimination of Trivial Cuts B . Reducing the Complexity of Cuts C. Permuting Cuts Upwards The reductions in these three groups require little comment or explanation. They do not differ significantly from the steps which figure in Gentzen's proof of the Hauptsatz except that I have insisted in B on applying any necessary thinnings to the premises of the reduced cut rather than to its conclusion. This is simply to facilitate comparison with reduction in D and does not really complicate matters. D . Splitting Up Cuts These too are direct analogues of reductions used in Gentzen's proof (in the induction on rank, when the mix formula is introduced by the preceding inference). They could be dispensed with, as they are in Appendix A, at the cost of complicating the steps described in B, but it seems more perspicuous to list them separately. They could also be made redundant if each formula introduced into a sequent were assigned a new index, although it would then become necessary to include contraction as a basic rule and to replace the steps in D by contraction conversions of various kinds. 1 The possibility of doing without contraction, while insisting that each formula must be introduced with a new index, is ruled out by the considerations in Chapter 4 above.2 It can only be realized if cut is replaced by a sort of mix rule which operates on all occurrences of the formula to be cut regardless of their indices. There is no advantage to be gained by introducing such a rule, however, since its elimination inevitably involves reduction steps analogous to those in D. 1 This is essentially the approach adopted by Zucker for LJ in his paper "On the Correspondence between Cut-Elimination and Normalization." 2 See Chapter 8 of Multiple-Conclusion Logic by Smiley and Shoesmith for an argument which in effect demonstrates the inadequacy of such a system.
A Formulation of the Classical Sequent Calculus
203
E. Elimination of Cuts with a Thinned Premise These appear in Gentzen too, although his formulation of thinning makes them slightly simpler to state. F. Thinning Permutations These have no analogues of any kind in Gentzen. They constitute a not altogether satisfactory solution to the problem posed by the need to associate each formula introduced by an application of thinning with an active formula in the premise of the rule. (I chose to formulate the thinning rules in this way in order to make them interpretable as operations on derivations of D. The only non-arbitrary interpretation of Gentzen's version of thinning seems to be in terms of derivations which are not necessarily connected graphs.) If the active formula in the premise of a thinning becomes (possibly after resubscripting) a cut formula in its conclusion, then it will disappear when the cut is permuted upwards, leaving the thinned formula stranded. The problem is to provide a systematic way of finding a new formula to replace the active one which is removed by the cut. The expedient I have adopted is to allow thinnings to be permuted upwards until they are applied only to axioms, at which point an obvious solution presents itself. It might be argued that this is a much more complicated procedure than the needs of cut-elimination demand. This is no doubt true, but it does seem necessary to preserve the hope of establishing an interesting connection between the various cut-free forms of a given derivation. Of course, cut-free forms are not unique, and to make them so, even in some weak sense of equivalent modulo certain permutations of inference, would require a much more conservative treatment of thinnings in the reductions in B and E. 3 To handle thinnings with care only in F represents therefore a rather uneasy compromise. I know of no entirely satisfactory way to deal with these inferences, however, and hence did not feel justified in reformulating the other reduction steps. G. Additional Thinning Permutations Again, these have no analogues in Gentzen, nor are they needed for cut-elimination. Their only purpose is to justify the notation introduced in convention (1) below. The point of the notation itself is to minimize the arbitrariness in the use of thinnings in B by making the order in which they are applied unimportant. Comments: First, there is one obvious difference between Gentzen's cut-elimination steps and those listed below that I have not stressed; it is that his are written for mix rather than cut. The distinction between these two rules 'This topic is discussed further in Chapter 7 above.
204
Normalization, Cut-Elimination and the Theory of Proofs
becomes a little blurred when sets replace sequences. This is because mix is needed only to deal with interchange. (I realize that it helps a little with contraction too, but one can just as easily manage without it as far as this rule is concerned.) Since interchange is not merely redundant, but makes no sense in the version of LK which I have described, the distinction between mix and cut loses some of its importance as well. Put differently, the calculus presented above can be compared to one which differs from Gentzen's in allowing the rules to operate upon any appropriate formula occurrence on the left [right] of a sequent instead of only on the leftmost [rightmost] one. Mix then becomes equivalent to cut or to a series of cuts— not to a combination of cut and the other structural rules. The cut rule, however, has one important advantage over mix from my point of view. It is that the reduction steps for the former preserve the structure of the derivation to which they are applied far better than those for the latter. (I am thinking particularly of reductions of the sort described in group C, which presumably would allow a mix to be permuted upwards past an inference whose premise contained an active occurrence of the mix formula.) Gentzen, on the other hand, seems not to have been concerned with this issue. Second, if the alternative formulation of LK—that is, without rules for negation—is adopted, all the reduction steps pertaining to -i-right and left can be omitted. No additional steps are made necessary by the inclusion of axioms for _L. Conventions: (1) Given
r
, ,
A,
I will write d r;Ajh A r',T;,4fchA,A'
for the derivation of Y',Y,Ak \~ A, A' obtained from (Ak/i)d by a series of thinnings applied to Ak- (When using this notation, I will always choose k so that Ak & I \ r ' . ) The dual notation, with Ai on the right of the sequent instead of the left, is explained in an analogous way.4 (2) If (Bj);r = T, then both
d (Bj);r\-A
(B*),(c»),rhA 4
and
d {Bj);T \-A
(B,),rhA,(c n )
Strictly speaking, the notation introduced here is not well-defined. But, whenever it happens to denote more than one derivation, they will be obtainable from one another by the permutations in G.
A Formulation of the Classical Sequent Calculus
205
denote the derivation d of T h A; otherwise they denote d (Bj);T\-A Bk,Cn,T\-A
d (B^ThA Bk,T\-A,Cn
and
respectively. The dual notation, with Bj, Bk and C„ changing sides, is explained in an analogous way. (3) I f ( ^ ) ; r ' = r ' , then d Tl-A;^
d! (Ai);r'\-A'
(r),r'H(A),A' is just the derivation d! of H h A'. (4) The letters s, t, u and v will be reserved for subscripts that occur only where explicitly indicated in the figures of which they are a part. (5) In A-E below, the figure on the left reduces to the one on the right; the permutations in F and G are symmetrical. The Cut-Elimination Steps A. Elimination of Trivial Cuts (1)
d T\-A;Ai r\-A,Aj
(2) Ajh Aj
AAj/i)
Ai\-Aj
r h A, A,
d A,;rhA
(Ai/j)d
^,ri-A B. Reducing the Complexity of Cuts (1)
a.
di
^2
n - A ; 4 j r ' h A';Bj r,r'l-A,A';^ASfc
^3
Ap;V" h A" AABk;T"\-A"
r,r',r"h A,A',A" di T\-A;Ai r,FI-A;,4s
dz r";,4pl-A" r";A8hA',A"
r,r,r"hA,A',A" b. Like (la) with di and d\ interchanged, and B A Ak in place of AABk.
206 (2)
Normalization, Cut-Elimination and the Theory of Proofs a.
di ThAjfii T\-A;AWBk
di Ap;r'\-A'
dz Bq;r"\-A" AVBk;T',T"\-Al,A"
r,r,r"hA,A',A" d\ T h A; Bj r,T'\-A;Bs
dz T"\Bqr- A" r";Bs\-A',A"
r,r,r"hA,A',A" b. Like (2a) with d3 and d2 interchanged, and B V Ak in place of AvBk. (3)
a.
di T;AP\- A;Bq T\-A;A-+Bk
di dz r'hA';A Bj;r"hA" A^Bk;T',T"hA',A"
r,r',r"hA,A',A" d2{As/i) (As/p)di(Bt/q) r'hA';As As;r\-A;Bt r,n-A,A';B,
(Bt/j)d3 BuF'hA"
r,r',r"h A,A' b.
di
T\-A;Bq T\-A;A-+Bk
dz S 7 ; T" h A" ^-B fe; r',r" h A ' ,A" T " r,r, h A , A ' , A " d2 r'hA';4j
dz
ThA;Bq_ _B,-;r"J:_^ // r,r'i-A;B, st;r"hA', A" r,r,r"hA,A',A"
(4)
dx rFA;A(q), T h A; Vgi4(a)fc
r,r'h d'(t) r\-A;A(t)s
d2 #)P;r'hA' Vx^(ar)fc; T h A' A, A' (A(t)s/p)d2 A(t)s;F \-A'
r,r'hA,A' where d! = di(A(a)s/i) (5)
di r\-A;A(t)j T I- A; 3xA(x)k
d(a) 2 A{a)p; V h A' 3xA{x)k; F \- A'
r,r'h A, A'
A Formulation of the Classical Sequent Calculus dM^s/i) T\-A;A(t)s
207
d"{t) A(t)s;r'hA'
r,r't- A, A' where d" = (A(a)s/P)d2 (6)
di Aj-,r\-A
d2 V \- A'; Ap
rhApA t n ^ r h A ' r,r'h A, A'
d2(As/p) T'\-A';AS
(As/i)di AS;T \-A
r, r \- A, A'
C. Permuting Cuts Upwards Cut-formula passive on the right: (1)
d2 d3 rfi (^rhA'jgj (Ai);T"\-A";Ck rhA;ij Aj-T'X'h A',A",BhCm r,r,r"hA,A',A",BACm d\ d2(Bs/j) di ds{Ct/k) T\-A;A% {Ai);r'\-A';Bt T\-A;Aj (A^T" \-A";Ct (r),r'h(A),A';g. (r),r"h(A),A";C t r,r',r"hA,A',A", J BAC m
(2) di T\-A;Ai
d2 d3 (^iBjiPrA' (^);C fc; r"KA" ii;BVCm,r',r"hA',A" Bvc*m,r,r',r"hA,A',A"
di (Bs/j)d2 di {Ct/k)dz ThA;At (Aj);BS;F h A' ThA;At (A,);Ct;T" \-A" Ct;(r),r"h(A),A" fi,;(r),r'h(A),A' Bvcm,r,r',r"F- A, A', A"
(3)
rf2 di rhA;ij
rf3
(^rHA'jBj W;Ct;rhA" A;B^Cm,r',r"hA',A" B-*cm,r,r',r"hA,A',A"
di
d2(Bs/j)
r\-A;Aj ( ^ r h A ' j B ,
di
rhA^i
(Ct/k)d3
(A);C;r"hA"
(r),rh(A),A';B < C t ;(r),r"h(A),A" 5-»cm,r,r',r"KA,A',A"
208
Normalization, Cut-Elimination and the Theory of Proofs
(4)
d2
dx ry-A-Ai
d3
(Ai)-,r'\-A';Bi (A^BfX' 4r',r"hA',A"
H A"
r,r,r"hA,A',A" d\ d2{Ba/j) T\-A;Ai (Aiy,r'\-A';BS (r),r'H(A),A';B,
di ThAjAj
(Bs/j)d3 (Aj); Ba;T" \-A" B3;(T),r"\-(A),A"
r,r',r"t-A,A',A" (5) dx rhA;^
d2 ^;r'hA';5j AJ;T"\-A"
R
r,r"hA,A"
di d2{Bs/j) r h A ; ^ ^ ; r'hA';g a r,r\-A,A'-,Bs
r,r"hA,A"
R
where R is V-right, 3-right, right thinning (a), left thinning (6) or -i-left (applied to Bj in the left hand figure, and to Bs in the right hand one).
(6) dj. r\-A;Aj
d2 A^B^T'hA' AJ;T"\-A"
^
dx r\-A;Aj
R
r,r"i-A,A"
(Bs/j)d2 Ai;BS;T' h A' Bs;T,r'\-A,A'
r,r"i-A,A"
R
where R is A-left, V-left, left thinning (a), right thinning (6) or -•-right (applied to Bj in the left hand figure, and to Bs in the right hand one).
(7)
(8)
d2
dY
(Bs/j)d2(Ct/k)
<*! A^BfX^C^A1 r\-A;Ai AiS'hB^C^A' r,r'hB-,c ra ,A,A'
ri-A;^ Ai;Bs;T' I- Ct; A' BS-YJ' h Ct; A, A' r,r'hB-.c m ,A,A'
d2{a) dx A^Bja^r'hA' r\-A;Aj Aj;3xB(x)m,r' hA' 3xB(x)m, T, T h A, A'
dx d2{b) r\-A;Aj A^Bjb^r'\-A' B(b)f,r,r h A, A' 3a;S( a ;) m , I \ V h A, A' where b occurs nowhere in di or d2.
A Formulation of the Classical Sequent Calculus (9)
d2(a) AyV \- B{a)yA' Ai;r'\-VxB(x)k,A'
di r\-A;Aj
209
di d2(b) rh-A;^ Aj-J'\-B{b)yA' T,T' \- B(b)y A, A'
r, r i- vxB(x)k, A, A'
r, r f- vxB(x)k, A, A' where b occurs nowhere in d\ or d2.
Cut-formula passive on the left: (10)-(18). These are just the duals of (l)-(9) above. In each case, put Ai on the left hand side of the conclusion of d\ and on the right hand side[s] of the conclusion^] of d2 [and dz], then rearrange the premises of the cut accordingly. By way of illustration, I have written out case (14) below. The remaining ones are left to the reader. (14)
d2 T'hA'-^yAj Y"hA"\Aj
dr ThA
R
r,r"i-A,A"
d2(Bs/j) dx T'bA';Bs;Ai AyThA r,r'hA,A';g8 R
r,r"i-A,A"
where R is V-right, 3-right, right thinning (a), left thinning (b) or -i-left (applied to Bj in the left hand figure, and to Bs in the right hand one). D. Splitting Up Cuts Cut-formula active on the right (1) dx T\-A;A\/Bm
d2 d3 Ay (A V Bm); r h A' By (A V Bm); T" h A" A\JBm,A\/Bm;T',T"y A\A"
r,r,r"hA,A',A" d2 dz At; (A V Bm); V \- A' By, (A V Bm); T" h A" iVB8;iV5m;r',r"hA',A" di
d!{A\/Bs/m) ri-A;.4Vffg
T\-A;AwBm AVBs;r,T',r"\-A,A',A"
r,r',r"h (2) di T\-A;A^Bm
A,A',A"
d2 dz (A -> g m ) ; ^ , h A';Aj By (A - Bm);T" h A" A^Bm,A^Bm;T',T"\-A',A" T,r',T"\-A,A',A"
210
Normalization, Cut-Elimination and the Theory of Proofs (A -» Bm); r h A'; At Bf, {A - Bm); T" h A" A^Bs;A-^Bm;T',T"\-A',A"
dM^Bt/n) rh A\A->B,
T\-A;A^B, A - ^ J 9 8 ; r , r ' , r " l - A , A ' /, AA «
r,r,r"hA,A',A" (3)
d2 Ai;r'\-A' A. . ^ f h A "
dl
r 1- A; Ai
„ 1 1
r,r"^A,A" di(As/i) ThA;^s
rf2 d, ^;r'hA' rhA;i, Ai;As;T"\-A"11 yls;r,r"hA,A"
r,r"i- A, A" where R is A-, V-, ->-left, left thinning (a) or (b)—provided that if R is left thinning (a), Ai is not the active formula in its premise. (4) di r\-A;Az
d2 Ak;T'\-A' 4„i,;rhA'LT(a)
r, r h A, A' di{As/i) rhA;i8
di T\~A;Ai
r, r (5) di r h A; 3xA(x)t
d^xAix)^) T h A; 3xA(x)s
d2 Ak;F' h A' LT(a) Ai\As;T ' h A ' i s ; r , r ' h A ,A' h A, A'
d2(a) A(a)n;3xA(x)i;T' \- A' 3xA(x)j, 3XA{X)J;T' \- A' r,r'h A, A' d2(a) A{a)n;3xA{x)i;T'h A' rfl r h A; 33^4(3:), 3 ^ ( 3 : ) , ; a ^ Q c ^ ; T' h A' 3xA(x)s; T, T' h A, A' T,T'\- A, A'
A Formulation of the Classical Sequent Calculus
211
Cut-formula active on the left (6)
di d2 T\-A;(AABm);Aj V h A'; (A A Bm); B, r,r'hA,A';AABm,iABm
d3 ,4 A flm;r" I-A"
r,r',r"h A,A',A" d\
d>2
rhA;(AABm);Aj r'hA';(AABmy,Bj r , r ' h A,A';AABm;AABs
d3 AABm\T"\-
A"
r,r',r"i-a,A',A";iAB 8 r,r',r"i-A,A',A" (7)
dx r'hA'^j „ r,lhA";Ai,ii
(A A Ba/m)d3 AABS-,T"\- A"
d2 Ai;rhA
r",ri- A", A di T'\-A';Ai
d2
r"hA";^;A8R ^fhA T",r\-A",A;AS r",ri-A",A
(J4s/i)d2 A3;r\-A
where R is V-, —•-, 3- , -i-right, right thinning (a) or (b)—provided that, if R is right thinning (a), Ai is not the active formula in its premise.
(8)
^ T\-A;Aj
d2
r\-A;Ak,AkKTW Ak;T'\-A' r,ri-A,A'
F h A
^
.RT(a)
d2
rhA;i,;4 W 4r'HA' iTFKATATA:
(^s/fc)d2 A,;fhA'
212
Normalization, Cut-Elimination and the Theory of Proofs
(9)
di(o) r \-A;VxA(x)f, A(a)n r h A; V x ^ x ) ^ V x ^ x ^
d2 V x ^ x ) ^ T h A'
r, r \- A, A' di(a) T\-A^xA(x)i;A(a)n r h A; Vx^(x) 8 ; Vx^(x)j Vx^(x) i ; T' I- A' r,ri-^.A^Vs-Apc), r , r i - A, A'
(Vx>l(x) s/i )d 2 ViA(i)8;r'hA'
£ . Elimination of Cuts with a Thinned Premise (1)
di Tl-A;^
d2
d'(Aj/s) A,A',A,
rhA,^;gfc fit;rhA' r,r'hA,AUj
r,r h where d' =
T h A; ^
r,r'HA,A';^ s (2) is like (1) except with Ai, ^ and As on the left. (3) dl
d2 Tj^hA'
(^/s)d'
ri-A;g fc B t ; r',^i-A' r,r,^hA,A'
r, r , ^ \- A, A' where d'=
r';^hA r,r';ishA,A'
(4) is like (3) except with .4,, Aj and As on the right. F. Thinning Permutations Premise of the thinning passive in the preceding inference Left Thinning (a) (1)
a. b.
Aj r- Aj Ak,Bnh Aj d
4rhA ii;r"HA" ^,c*fc,r"i-A"
Ak\-As Ak,Bn\-Aj d
A<;ri- A 4Ct,rhA „ Aj,ck,r"\-A"K
A Formulation of the Classical Sequent Calculus
213
where R is -i-left, V-, 3-, V-right, right thinning (a) or left thinning (6), provided that, if R is V-right and the proper parameter of R occurs in C, it is replaced by a parameter which does not occur in the figure on the right. d (Bs/n)d 4r;BnhA 4r;5shA n Ai;T"\-A" A,,Ci,r;fl,hAn
Aj, ck, r" h A" rt
^•,cfc,r"hA"
where R is ->-, —>-right, A-, 3-, V-left, right thinning (b) or left thinning (a), provided that, if R is 3-left and the proper parameter of R occurs in C, it is replaced by a parameter which does not occur in the figure on the right. (Bn is supposed to be a premise of R in d, and Bs in (Bs/n)d.)
(At);r!-A;EP WirhA'jf, 4r,r'hA,A',£AFm A j ,C fc ,r,r'l-A,AM5AF ra (^) ; rhA ; ff p (^r'l-A'if, (^),(C fc ),r h A;EP (^),(C fc ),F h A';F, A i ,C f c ,r,r / r-A,A',£AF r o ^ i ; r,r,£vF m i-A,A' ^,Cfc,r,r',£VFro(-A,A' (Es/P)di (4,);r;£8HA (Aj), (Ck), F;ES\-A
(Ft/q)d2 (^);r';F(hA' (Aj), (Ck), P ; Ft \~ A'
i j ,ft,r,r',£vF m hA,A' (Aj);r\-A;EP
( ^ r ' j f ^ A ^
4r"hA,A' ^,C f e ,r"hA,A' di(Es/p) (Ai);T\-A;E, (^),(Ct),rhA;£8
H
(Ft/q)d2 (4i);r';F«hA' {As),(Ck),r;Ft ^A'
^,C f c ,r"hAA'
R
where R is —> -left or cut, with premises Ep and Fq in the figure on the left, and Es and Ft in the one on the right. If R is cut, I assume that s = t. (Notice that the possibility of Fq being equal to Ai has not been excluded.)
214
Normalization, Cut-Elimination and the Theory of Proofs
(lb-f) above simply state that an application of left thinning (a) the active formula of whose premise is passive in the conclusion of the preceding inference can be permuted upwards past that inference. Ignoring complications which arise from the need to keep the active formulae in the premises of the inference distinct from the active formulae in the conclusion of the thinning (and, in case the inference is an application of V-right or 3-left, from the need to ensure that the proper parameter does not occur in the conclusion of the thinning), there are really only two cases to consider according to whether the inference has one or two premises. Having shown how these complications are handled in the case of left thinning (a), there is no reason to give a similarly detailed treatment of the remaining thinning rules since no new problems arise. d So, given T h A where / is an application of any one premise rule, T'hA'7 let the derivation dj be obtained from d by applying whatever contractions may be necessary to preserve the distinctions mentioned above and, in case J is an application of V-right or 3-left, by possibly replacing its proper parameter. (It is convenient to write the conclusion of dj as T h A, even though it may differ from the conclusion of d in minor respects.) If J is an application of a two premise rule which operates on the conclusions of d and d', the derivations dj and d'j are explained similarly. With the help of this notation the remaining cases can be presented in a simple and uniform way. Left Thinning (b) (2)
a.
As h Ak Bn,Ai\- Ak
Aj h Aj Bn,Ai\-Ak d T\-A;Ai T'l-A';^
b.
di r l
rh A Ck,T\-A,Aj -5=
c.
d
T-.
n
:— 1
d'
r\-A;{Aj) r'hA';(ii) F'hA";^
Cfe,r"hA",A
•J
dj
d'j
r\-A;(Aj) r'HA';^) (<7fc),rhA,(A,-) (Ctl.r'hA',^) Ck,r'\-A",Aj
A Formulation of the Classical Sequent Calculus
215
Right Thinning (a) (3)
Ai\-Aj A3hAk Ai\-Ak,Bn Ai\-Ak,Bn b. Like (2b), but with Ck on the right. c. Like (2c), but with Ck on the right. Right Thinning (b)
(4)
a.
a.
A* h Aj Ak \-Aj,Bn
AkhA8 Ak \- Aj,Bn
b. Like (3b), but with Ai and Aj on the left, c. Like (3c), but with Ai and Aj on the left. Premise of the thinning active in the preceding inference (5)
a.
d Ai^BjYJhA Bj;r\-A Bk,Cn,T\-A
(As/i)d A.;(Bj);r\-A As;(Bk),(Cn),T\-A At;(Bk),Cn,T\-A K Bk>Cn,T\-A
where R is A-, V- or 3-left. In this last case, if the proper parameter of R. occurs in C, it must be replaced in (As/i)d by one which does not. b. Like (5a) with Cn on the right. (6)
a. Like (6b) below with Cn on the left. b.
d
K);rh4A -^jiTf-A ^Ak,T\-A,Cn
(7)
a. i.
d\
d(As/i) (^A^TY-A.-,* (^Ak),rhAs;A,(Cn) (-nAk),F\-At;AyCn ^Ak,T\-A,Cn c?2
4(Avfim);rhA Bf,(AvBmy,r'\-A' AVBm;r,r'hA,A' AVBfc.Cn.r.n-A.A' {As/i)di As;{AvBm);r\-A (Bt/j)d2 As;(AVBk),(Cn),r\-A Bt;(A V Bm);T' h A' Au;(A V Bk),Cn,r\-A Bt;(A V Bk), (C B ), V f- A' AvBfc,Cn,r,r'hA,A'
216
Normalization, Cut-Elimination and the Theory of Proofs ii.
d\
d,2
^;(^VBm);ri-A
Bf,(A\/Bm);T'\-A'
xvB f c ,c7„,r,ri-A,A / {Bt/j)d2 {As/i)di Bt;(A\/Bm);r'\-A' As;(AWBm);rh-A Bt; (A V Bk), (Cn),V \- A' As; (A V Bk), (C7W), T h A Bv; (A V £ fc ), C n , T h A' AVBk,Cn,T,r'hA,A' b. Like (7ai) and (7aii) with Cn on the right. (8)
a. i.
d\ di (A^Bm);r\-A;Ai Bj;(A^Bmy,r'\-A' ^Bm;r,r'hA,A' yl^B fc ,C„,r,r'hA,A' di (A-*Bmyr\-A;Aj (A^Bk),(Cn),r\-A;Ai (A -> B f c ),Cn,r h A;AU
(Bt/j)d2 Bt; (A -> ffm);T I- A / Bt;(A - B f c ) , ( C B ) , r h A'
i^Bt)c„,r,rhA,A' ii.
di
(A^Bmyr\-A;Ai
c?2
B^iA-^B^-T'hA'
A-flm;r,rhA,A; ^-Bfc,cn,r,r'i-A,A'
di B,;(i^fim);r'hA' (A-a^rhA;^ fii;(i-^gt),(g,r'hA' (A - Bk), (Cn), r h A; ^ B,; (A -> Bk), Cn, V h A' A^5fe)cn,r,ri-A,A' b. Like (8ai) and (8aii) with C„ on the right—except that, if At = Cn, d\ in the figures on the right must be replaced by d\(As/i) whenever (A - > B m y r / r . (9)
i.
d (Bmyr;Ai\-A Bm;T,Aj\-A Bk,Cn,T,Aj\-A
(As/i)d (Bmyr;As\-A (Bk),(Cn),r-,Aa\-A {Bk),Cn,T;AtyA Bk,Cn,Y,Ajb A
A Formulation of the Classical Sequent Calculus 217 ii.
d (Amy,r;Ai\-A Am;Tt-A 4,C„,rhA
d (Am);r;iihA ^m;r;^sl-A Ak,C„,T;As\-A Ak,Cn,T\-A
a. Like (9ai) and (9aii) with Cn on the right. (10) a. Like (10b) below, with Cn on the left. b. d d{Aa/l) {Bm);T\-A;Aj (B m );T h A; As £?m;rhA,^ (Bk),T\-A,{Cn);At Bk,rhA,Aj,Cn Bk,T\-A,{Cn);At Bk,T\A,Aj,C„ (11) a.
d rhA;(Bm);A rhA;Bm T\-A,Bk,Cn
• ix
d(A,/i) T \-A; (Bm); A rr-A,(Bfc),(Cn);is r\-&,(Bk),Cn;At n rhA,^,cn
where R is V-, 3- or V-right. In this last case the proper parameter of R, if it occurs in C, must be replaced in diAg^) by one which does not. b. Like (11a) with Cn on the left. (12) a
d r;Ai\-Bj;(A-+ Bm);A D-A^Bm;A T\-A^Bk,Cn,A
d(Bt/j) Y;A^Bu{A^Bm);A T;AihBu(A^Bk),{Cn),A r;Ai\-Bu;(A->Bk),Cn,A Y\-A-+Bk,Cn,A
b. Like (12a) with Cn on the left and (As^)d
instead
oid{Bt/j).
(13) a. Like (13b) below with Cn on the right. b. d (A9/i)d r;^h(^);A r;A.\-(^Ai);A rf-^;A T,(Cm);As\-hAk),A T,Cm\-^Ak,A r,Cm;At\-(^Ak),A T,Cm\-^Ak,A (14)
i.
dx d2 r\-A;(AABm);Ai F' 1- A'; (A A Bm); B, r,r'\-&,&';AABm
r , r i - A, A', A A Bk,cn
218
Normalization, Cut-Elimination and the Theory of Proofs di{Aa/i)
rhA;(AAB m M, d2(Bt/j) r\-A,(AABk),(Cn);Aa T' \- A'; (A A Bm); Bt V h A', (A A Bk), (Cn); Bt r I- A, (A A Bk), Cn; Au r,r'hA,A',AABfe,c„ ii.
d\ r\-&i(AABm);Ai r,r'*-A,A';AABm r,T't-A,A',AABk,Cn
^2 rr-A'jjAABnhBj
dMs/i) r\-A;(AABm);Aa r h A, (A A Bk), (C n ); ^ s
T'hA';{AABm)\Bt V \- A',(A A Bk),(Cn);Bt r h A', (A A Bk), Cn; Bv
r,r \-A, A \ A A Bk,cn a. Like (Mai) and (14aii) with Cn on the right. (15)
i.
ii.
d r\-A;Ai;(Bm) T\-A,Aj;Bm ThA,Aj,Bk,Cn d T\-A;Ai;(Am) T\-A;Am T\-A,Ak,Cn
d(As/i) T\-A;As;(Bm) r\-A,(Bk),(C„);As r\-A,(Bk),Cn;At Y\-A,Aj,Bk,Cn d T\-A;Ai;(Am) T\-A;Am;As r\-A,Ak,Cn;As T\-A,Ak,Cn
a. Like (15ai) and (15aii) with Cn on the left. (16) a. Like (16b) below with Cn on the right. b. d (As/i)d r;Ai\-A;(Bm) T; As b A; (Bm) r,Aj\-A;Bm (Cn),T;AahA,(Bk) Cn,T,Aj h A , B k Cn,r;i,hA,(Bt) Cn,T,Aj\-A,Bk G. Additional Thinning Permutations (1)
a.
d T\-A;Ai Tr-A,Bj;Ak r\-A,Bj,Cn,Am
d T\-A;Ai r\-A,Cn;As Tr-^B^CntAr,
A Formulation of the Classical Sequent Calculus
(2)
b. Like (la) with Cn on the left. c. Like (la) with Bj on the left. d. Like (lc) with Cn on the left. a. d Ak-Bj,T\A Ami Cn•> Bj, r r A b. Like (2a) with Cn on the right. c. Like (2a) with Bj on the right. d. Like (2c) with Cn on the right.
d Ai-,T\- A As;Cn,Th-A Am,Gn,Bj,l r A
219
Appendix C
Proofs and Categories This appendix outlines one way in which the derivations of a formal system may be regarded as representing the morphisms of a category with some additional structure. The possibility of such a representation arises from the similarity between the rules of Gentzen's N and L systems, on the one hand, and the definitions of product, exponent, etc. in category theory, on the other. Because these morphisms must satisfy certain identities, they are not in general represented by a unique derivation. This naturally suggests the question of what logical sense can be made of the notion of two derivations representing the same morphism and whether this relationship can be characterized solely in terms of structural properties of the derivations themselves, without reference to their categorial interpretation. These topics were first investigated by Lambek in a series of papers on deductive systems and categories.1 Subsequently, Szabo gave an account of the negative fragment of LJ in terms of a relation between derivations which he called 'equi-generality'.2 Mann reproduced Szabo's results for the negative fragment of NJ and showed that equi-generality was equivalent to the relation 'being reducible to the same expanded normal form' (in the sense of Prawitz). 3 When Szabo extended his treatment to the whole of intuitionistic (first-order) predicate logic, he abandoned his original equivalence relation between derivations in favor of one which, like Mann's, is defined in terms of unique normal forms.4 (The word "unique" needs some qualification, but I shall ignore that complication here.) Derivations most naturally represent morphisms having a sequence of 1 "Deductive systems and categories I," Mathematical Systems Theory, Vol. 2, pages 287-318, "Deductive systems and categories II" in Springer Lecture Notes in Mathematics, Vol. 86, pages 76-122, and "Deductive systems and categories III" in Springer Lecture Notes in Mathematics, Vol. 274, pages 57-82. 2 "A categorical equivalence of proofs," Notre Dame Journal of Formal Logic, Vol. 15, 1974, pages 177-191. See also the addendum to this paper in the same journal, Vol. 17, 1976, page 78. 3 "The connection between equivalence of proofs and cartesian closed categories," Proceedings of the London Mathematical Society (3), Vol. 31, 1975, pages 289-310. 4 Algebra of Proofs, Amsterdam, 1978.
220
Proofs and Categories
221
objects, rather than a single object, as their domain. There is, however, no standard way of treating such morphisms within category theory. Of course, in the presence of pairing (or products) an n-place function can always be regarded as a function of one argument, but this approach is not well suited to the present purpose. On the other hand, the multicategories of Lambek and sequential categories of Szabo are cumbersome and difficult to work with. I propose to adopt here an alternative notion of 'multicategory'. Rather than defining it at the outset, however, I shall try to show how it arises from the attempt to introduce additional structure into an ordinary category. The idea is to give a categorial interpretation of propositional logic (or, more properly, of its derivations). So, let Lp be a language comprising a set P of propositional variables, the propositional constants T and _L, and the binary connectives A, V and —>. Let Cp be the discrete category whose objects are the members of P. I want to extend Cp to a cartesian category CA which will serve to interpret logic based on the A- fragment of Lp. The objects of C A will be the formulae built up from the members of P by means of conjunction. The morphisms of CA will include an identity morphism for each object. In addition, for each object of the form A A B, there will be projection maps ir^AB: A A B »-• A and n^3'- A /\ B *-+ B. (I will suppress reference to their domains when these are obvious from the context.) Finally, if / : C*-> A and g: £)»-• B, there will be a unique pairing map (f,g):
C, £)•-• A A B which makes the following diagram commute:
CJD
f/ /
/
,
1
A <
A AS 7Ti
• B 7T2
Here, C and D are supposed to be sequences of objects of C A , and C, D their concatenation. It is convenient to identify the object A with the sequence of length 1 whose only term is A. It remains to ensure that the multimaps of CA are closed under composition. The commutativity of the product diagram means that / = {/, 9) °
TTI
and
g = (/, g) o 7r2
As a result, the domains of (/, g) o m (i = 1, 2) will be a subsequence of the domains of (/, g). In general, unless / and g both have single domains, the domains of / o g may differ from those of / . This makes it necessary
222
Normalization, Cut-Elimination and the Theory of Proofs
to specify domains when composing multimaps. It will usually be obvious what is intended, however, and in such cases I won't indicate it explicitly. In addition to the above, CA should satisfy the usual axioms for a category (adapted straightforwardly to take account of multimaps). It follows that composition, however we understand it, must satisfy the following conditions: (1) If g = 1A and / : C*-+ A, f o g = / ; if / = 1^ (where E is a term of C) and g: C*-> A, f o g = g
(2) If / : 0 - » A A B is of the form (/*, h'), where h: D^> A, hf: D'^ and C=D,D',
B
then:
a. if<7 = 7r^ A B , fog b. ifg = 7rf*B, fog
= h = h!
(3) If 9 = (h, ft'): I?, D'*-* A A B and / : C»-> E, then fog
= (fohJoh'):
{C
/E)DJ)'^AAB
By convention, / o g is just # when the range of / does not appear among the domains of g. (C IA) D is supposed to be the result of replacing each occurrence of A in D by the sequence C(1) follows from the properties of identity morphisms in a category, (2) from the commutativity of the product diagram, and (3) from the uniqueness requirement on morphisms of the form (x, y). To see this last notice that f oh: (C /E) D>-> A and / o h'\ (C /E) D'*-> B, so there is a unique (fohjo
h'): {C IE) D, D'^
A/\B
such that
(foh,hof')oiri
= f oh
and (foh,foh')on2
=
foh'
But (fog)oTTi
= fo(goTTi)
=
foh
and ( / og)o7T2
=
foh'
Hence f°g = ifohjoh') Suppose we have closed the class of projection maps under composition. (Since these always have a single domain, n o n' for example will be a map from the domain of n to the range of 7r'.) We can now define the class of multimaps of C A to be the closure under pairing of the identity, projection and compositions of projection maps. It is then easy to infer from conditions (l)-(3) above, using the associativity of composition, what might be called the cut-elimination theorem for the A-fragment of LJ, namely:
Proofs and Categories
223
The morphisms of C A are closed under composition. In other words, CA is a cartesian category.5 I now want to extend this treatment to include implication. The idea is to construct a cartesian closed category, C A -M whose objects will be the formulae built up from P by conjunction and implication. The morphisms of CA— will include identity, projection and pairing maps for this enlarged set of objects together with an evaluation map eA~*B: A,{A-* B) *-+ B for each object of the form A —• B. In addition, for each morphism / : A, ( 7 ^ B, there will be a unique exponent map e ( / ) : C^ (A —• B) such that the following diagram commutes:
The notation ^ * • for arrows is used to make it easier to read the domains of the various maps from the diagram. Here, e(f)oe and / have the same domains and the commutativity of the diagram allows us to identify these two morphisms. In addition, to (1), (2) and (3) above, composition must satisfy: (4) If g is for the form e(h): C*-> {A —• B) and / : D*-* J5, where E appears among the terms of C, then / o g: (£) jE) C*-* (A —> B) is just
e{foh). (5) If / : D»-> {A —> B) is of the form e(/i), where h: A,D*-+ B, and g is eA^B, then / o g = h. (Again, (4) follows from the uniqueness of e(f o h) and (5) from the commutativity of the exponent diagram.) It might seem, by analogy with C A , that we need only close the evaluation and projection maps (taken together) under composition, add the resulting class to the identity morphisms of CA— and then close everything under pairing and exponentiation in order to obtain a cartesian closed category. Even with these additions, however, and conditions (l)-(5), there 5 C/. the discussion of logical calculi at the beginning of Chapter 4 above, where it was pointed out that the meaning of 'application of a rule' determines the notion of substitution for derivations and that the class of derivations is closed under the latter operation.
224
Normalization, Cut-Elimination and the Theory of Proofs
is still one possibility left uncovered, namely the case in which / : D*~+ A is composed with eA^B'. This makes it necessary to introduce morphisms / o eA~*B: D,{A-+ B) \-+ B for each such / if the cut-elimination theorem is to hold for CA—• An alternative (more in the spirit of Gentzen's —»-left rule) is to replace the maps eA~~*B by ef~*B: D, (A-> B) *-> B, for each / : D*-+ A, and require that, for any h: A,C>-> B, there is a unique map e(h) such that the following diagram commutes:6 / A,C
C D
i
V
e(h)\
i A-+B -
_ _ ^ >
* > B
If this alternative is adopted, condition (4) remains unchanged, condition (5) generalizes to: (5') If / : D*-> (A —> B) is of the form e(ft), where h: A, D>-> B, and g is of the form g: C,(A-> B) »-• B is of the form eArB, then fog — h' oh.
where h': C^
A,
and the last remaining case is dealt with by the following: (6) If / : E^> F (where F # A -+ B) and p = e*: 5 , (4 - • B) *-> B, then / o g: (E /F) D,(A->
B)^B
is efoh.
The cut-elimination theorem for CA_>—i.e., the statement that its morphisms are closed under composition, or that it is a closed cartesian category—can now be proved with the help of conditions (l)-(6) by a slightly more complicated inductive argument than was the case for C A . That the maps eA~*B do not suffice for cut-elimination is reflected in the sequent calculus by the fact that cut-elimination does not hold when the —•-left rule is replaced by T;Bh A Y,A,A->B\A 6
Notice that e(h) does not depend upon / .
Proofs and Categories
225
In natural deduction calculi, it is reflected in the structure of normal derivations—more precisely, in the fact that a branch of such a derivation cannot pass through the minor premise of an application of —^-elimination if it is to consist of a series of eliminations followed only by introductions. Despite the added complications associated with implication the above seems to provide rather a satisfactory interpretation of a fragment of the sequent (or natural deduction) calculus and its associated normal form theorem. Furthermore, it is easily extended to the full negative fragment (of intuitionistic propositional logic) by introducing an initial object _L into CA— together with a unique morphism JL^: _!_»-*• A for each object A. Unfortunately, the picture is spoiled somewhat by a number of complications having to do with the 'structural' properties of morphisms which I have chosen to disregard here. For example, we really need some principle corresponding to Gentzen's interchange rule which, given / : A,B »-• C, say, will ensure the existence of / ' : B,A*-+C and allow us to treat / and / ' as equivalent in some sense. For reasons such as this, the approach sketched above does not work out very well in detail. It can, however, be modified in such a way as to avoid these difficulties. The modified approach is connected to the present one as the formulation of sequent calculi in terms of sets of indexed formulae is connected to the usual formulation in terms of sequences. Basically, the idea is to operate with sets of indexed formulae rather than sequences. The objects of the category should still be formulae, however, rather than indexed ones. One way of accomplishing this is to think of multimaps as having arbitrarily long sequences of sets of domains, not excluding the empty set. The occurrence of A as a member of the mth set of a sequence will be associated with the indexed formula Am. Within this framework, the interchange principle mentioned above is no longer needed, and structural operations corresponding to contraction and thinning can be conveniently introduced if desired. My aim in this appendix has just been to give a preliminary exposition. For that purpose, the approach taken above seems the most perspicuous and is easiest to motivate. Rather than modifying it now and trying to spell out in detail the interpretation of derivations from the negative fragment, I want to conclude with a brief sketch of how disjunction might be incorporated into this framework. Let us begin by reconsidering Cp to see what is involved in extending it to the dual of C A , the co-cartesian category C v - The objects of C v are just the formulae built up from P using V, and its morphisms include identity maps (one for each object), injection maps z^ VB : A »-> A V B and %2yB' B H-> A V B for each object of the form A V B, and, given / : A *-+C and g: B
H-ȣ),
a unique map [/, g]: Ay B i-+C, D, such that the following
226
Normalization, Cut-Elimination and the Theory of Proofs
diagram commutes: AVB
Dually to the case of C A , the commutativity of the coproduct diagram means that / = ii°[f>9] and g = i2<>[/,0] so that the ranges of ij o [/, g] (j = 1, 2) will be a subsequence of the ranges of [/,]. In general, unless / and g both have single ranges, the ranges of fog may differ from those of g. Furthermore, composition in C v must satisfy the duals of conditions (1), (2) and (3), namely: (1') If f = 1A and g: A H-*C, then fog of C) and / : A >->C, fog
= g-, if g = \E (where E is a term
= f.
(2') If g: Ay B *->C is of the form [ft, ft'], where ft: A H->5, ft': B ~D' and C=D,D', then: a. if / = ti, fog = ft. b. if / = t2» f°9 = &'• (3') If / = [ft, ft']: AW B >-+C and #: D »-+£, then
fog
= [hog,tiog]:
AyB^(C/D)E
It can now be shown that, once the injection maps are closed under composition, C v becomes a category of the appropriate kind. Of more interest, however, is the result of trying to extend CA and C v to a category CAv containing both products and coproducts. Composition in such a category must satisfy both (3) and (3'). So if / = [ft, ft'] and g = (fc, fc') fog
= [h,ft'] o (i,fc')
= =
([ft,ft'] o fc,[ft,ft'] o fc') [fto(fc,fc'),ft'o(fc,fc')]
by (3) by (3').
In the special case that h: A*-> D, h': B *-+ D, h = ID and fc': C*~* E, where D is not among the terms in C, this yields:
w
([h,hik') = [(h,k'),(h\k')}
The derivations of NJ are mapped onto morphisms of a category in such a way that ft, for example, would be the image of a derivation of D from
Proofs a n d Categories
227
A. Translated into these terms, (*) becomes:
[A] [B] AVB
[A]
n2 c D n = 3
Ki
D
D
' E
[B] C
C
iii n 3 D E DAE
AVB
DAE
n2 n 3 D E DAE
DAE
This illustrates the fact that, if derivations are to be interpreted as the morphisms of a category, the appropriate equivalence relation between them is not simply interreducibility in the sense of Prawitz, but interreducibility between equivalence classes of derivations obtainable from one another by permuting inferences.7 The example (*) does not depend upon particular features of the description of multicategofies, in fact, it does not depend upon multicategories at all and can be reproduced using the standard definitions of product and coproduct in a category. An example which does depend upon mappings having more than one domain is the following. Consider the maps / : A, G *-+ E, g: B,G *-+ E, h: C,H »-• E and A:: D.H^E. Then [/,#]: A V B,G ^ E and [h,k]: CVD,H^E.S Let a
(C.l) and (C.2)
= [ [ / , 0 M M H : AM B,G\/H,CV if
vD
D^E
o a = ft :
AyB.GVH.C^E
^ V D o a = /? 2 :
A\/B,GVH,D^E
It follows from the coproduct diagram that (C.3)
i f V H o fa = [/, g]
and
iff v H o fa = h
Also (C.4) i ? v " ° ft = [/,] and ifHofa = k But there is a unique map /?i satisfying (C.3), namely [[/,#],/i], and a unique map f32 satisfying (C.4), namely [[/, g), k]. Similarly there is a unique map a satisfying (C.l) and (C.2), namely [ft, ft] (= [[[/,#], h], [[/,£],*:]]) In short,
(C5) 7
[[/,ff],[M]] =
[[[fMM\f,9U]]
See also "Weak Adjointness in Proof Theory" by R. A. G. Seely (pp. 697-701 of Applications of Sheaves^ ed. by M. P. Fourman, C. Mulvey and D. S. Scott, Vol. 753 of Springer Lecture Notes in Mathematics, 1979), where a similar generalization of the permutative reductions is employed. 8 To avoid the kind of problem discussed in Chapter 4 above (in connection with multiple conclusion derivations), I assume some way of amalgamating the occurrences of G in the domains of [f,g] and likewise those of H in [h,k]. These problems are easily solved— disappear, in fact—when the present approach is modified along the lines suggested earlier. Furthermore, for the sake of simplicity, I have tacitly identified the sequence £ , E with the object E.
228
Normalization, Cut-Elimination and the Theory of Proofs
= [[[[f,9},h),[f,9}],[[[f,9},h}M (this last equality by the same argument applied to the right hand side of (C.5)) and so on, ad infinitum. The reason I have given this example is that it corresponds to Zucker's non-terminating, non-repeating reduction sequence for natural deduction derivations. Here, however, it takes the less troubling form of an infinite number of terms all of which denote the same morphism. Allowing morphisms which, in addition to a series of domains, also have a series of ranges introduces some problems which have not been addressed here. They are needed however if the full duality between products and coproducts is to be preserved. They also facilitate the interpretation of multiple conclusion derivations. For the derivations of NJ and LJ, on the other hand, they are not required and the coproduct diagram can be modified accordingly so that they do not arise.
List of Works Cited Baker, G.P. and Hacker, P. Frege: Logical Excavations, New York, 1984. Barwise, J., ed. Handbook of Mathematical Logic, Amsterdam, 1977. Brouwer, L.E.J. "The Effect of Intuitionism on Classical Algebra of Logic," Proceedings of the Royal Irish Academy, Section A 57, 1955, pp. 113-116; reprinted in Vol. I of Brouwer's Collected Works, pp. 551-554. Collected Works, Vol. I edited by A. Heyting, Amsterdam 1975. Church, A. The Calculus of \~Conversion, Princeton 1941. Crossley, J.N. and Dummett, M.A.E., eds. Formal Systems and Recursive Functions, Amsterdam, 1965. Curry, H.B. and R. Feys. Combinatory Logic I, Amsterdam, 1958. Dragalin, A.G. "A strong theorem on normalization of derivations in Gentzen's sequent calculus," Studies in the Theory of Algorithms and Mathematical Logic ed. by A.A. Markov and V.I. Khomich, pp. 26-39 (Russian). Mathematical Intuitionism: introduction to proof theory, Vol. 67 of the AMS series "Translations of Mathematical Monographs," Providence, 1988. Fenstad, J.E., ed. Proceedings of the second Scandinavian Logic Symposium, Amsterdam, 1971. Feferman, S. Review of Prawitz, "Ideas and Results in Proof Theory," The Journal of Symbolic Logic, Vol. 40, 1977, pp, 232-234. Fl0istad, G., ed. Contemporary Philosophy: a new survey, Vol. I, The Hague, 1981. 229
230
Normalization, Cut-Elimination and the Theory of Proofs
Fourman, M.P., C. Mulvey and D.S. Scott Applications of Sheaves. Proceedings of the L.M.S. Durham Symposium 1977, Vol. 753 of Springer Lecture Notes in Mathematics, New York and Berlin, 1979. Frege, G. Conceptual Notation, and Related Articles, translated and edited by T.W. Bynum, Oxford, 1972. (This contains a complete translation of Begriffschrift.) Translations from the Philosophical Writings of Gottlob Frege edited by P. Geach and M. Black, 3rd. edition, Oxford, 1980. Friedman, H. "Equality between Functional," Logic Colloquium: symposium on logic held at Boston, 1972-73, edited by R. Parikh, pp. 22-37. Frisch, J.C. Extension and Comprehension in Logic, New York, 1969. Gentzen, G. The Collected Papers of Gerhard Gentzen, edited and translated by M.E. Szabo, Amsterdam, 1969 "Untersuchungen liber das logische Schliefien," Mathematische Zeitschrift, Vol. 39, 1934, pp. 176-210 and 405-431. (An English translation of this paper can be found in the preceding item.) Girard, J.Y. "Linear Logic," Theoretical Computer Science, Vol. 50 (1987), pp. 1102. Girard, J.Y., Y. Lafont and P. Taylor. Proofs and Types, Cambridge, 1989. Godel, K. "Uber eine bisher noch nicht beniitzte Erweiterung des finiten Stanpunktes," Dialectica, Vol. 12, 1958, pp. 280-287. Collected Works, Vol. II, Oxford, 1990. Grube, G.M.A. Plato's Republic, translated by G.M.A. Grube, Indianapolis, 1974. Gunthner, F. and D.M. Gabbay, eds. Handbook of Philosophical Logic, Vol. I (Elements of Classical Logic), Dordrecht, 1983. Heyting, A. Intuitionism, 3rd. edition, Amsterdam, 1971. Hilbert, D. and W. Ackermann. Principles of Mathematical Logic, New York, 1950. (This is a revised translation of the 2nd. edition of their Grundzuge der theoretischen Logic, Berlin, 1938.) Hilton, P., ed. Category Theory, Homology Theory and their Applications I, Vol. 86
List of Works Cited
231
of Springer Lecture Notes in Mathematics, Berlin and New York, 1969. Hindley, J.R. and Seldin, J.P. Introduction to Combinators and \-Calculus, Cambridge, 1986. Howard, W. "The Formulae-As-Types Notion of Construction," manuscript, 1969; a slightly revised version of this paper appears in To H.B. Curry, edited by Seldin and Hindley, pp. 479-490. Hyland, J. and R. Gandy, eds. Logic 76, Amsterdam, 1977. Kanger, S., ed. Proceedings of the third Scandinavian Logic Symposium, Amsterdam, 1975. Kleene, S.C. Mathematical Logic, New York, 1967. Kline, M. Mathematics and the Loss of Certainty, Oxford, 1980. Kneale, W. "The Province of Logic," Contemporary British Philosophy, third series, ed. by H.D. Lewis, London, 1956, pp. 237-261. Kneale, W. and M. Kneale. The Development of Logic, Oxford, 1962. Kreisel, G. "A Survey of Proof Theory," Journal of Symbolic Logic, Vol. 33, 1968, pp. 321-388. Review of Tait, "Intensional Interpretations of Functional of Finite Type I," Zentralblatt fur Mathematik, Vol. 174, 1969, pp. 12-13. Review of Szabo (ed.), The Collected Papers of Gerhard Gentzen, The Journal of Philosophy, Vol. 68, 1971, pp. 238-265. "A Survey of Proof Theory II," Proceedings of the second Scandinavian Logic Symposium, ed. by J.E. Fenstad, pp. 109-170. Lambek, J. "Deductive Systems and Categories I," Mathematical Systems Theory, Vol. 2, pp. 287-318. "Deductive Systems and Categories II," in Category Theory, Homology Theory and their Applications I, edited by P. Hilton, pp. 76-122. "Deductive Systems and Categories III," in Toposes, Algebraic Geometry and Logic, edited by F.W. Lawvere, pp. 57-82. Lawvere, F.W., ed. Toposes, Algebraic Geometry and Logic, Vol. 274 of Springer Lecture Notes in Mathematics, Berlin and New York, 1972.
232
Normalization, Cut-Elimination and the Theory of Proofs
Leivant, D. "Assumption Classes in Natural Deduction," Zeitschrift fiir mathematische Logik und Grundlagen der Mathematik, Vol. 25, 1979, pp. 1-4. Lewis, H.D., ed. Contemporary British Philosophy, third series, London, 1956. Mann, C. "The Connection between Equivalence of Proofs and Cartesian Closed Categories," Proceedings of the London Mathematical Society (3), Vol. 31, 1975, pp. 289-310. Markov, A.A. and V.I. Khomich, eds. Studies in the Theory of Algorithms and Mathematical Logic, "Nauka," Moscow, 1979 (Russian). Martin-Lof, P. "About Models for Intuitionistic Type Theories and the Notion of Definitional Equality," Proceedings of the third Scandinavian Logic Symposium, edited by S. Kanger, pp. 81-109. McCall, S., ed. Polish Logic 1920-1939, Oxford, 1967. Nagel, E., P. Suppes and A. Tarski, eds. Logic, Methodology and Philosophy of Science, Stanford, 1962. Olson, K.R. An Essay on Facts, Stanford, 1987. Parikh, R., ed. Logic Colloquium: symposium on logic held at Boston, 1972-73, Vol. 453 of Springer Lecture Notes in Mathematics, Berlin and New York, 1975. Pottinger, G. "Normalization as a Homomorphic Image of Cut-Elimination," Annals of Mathematical Logic, Vol. 12, 1977, pp. 323-357. Prawitz, D. "Ideas and Results in Proof Theory," in Proceedings of the second Scandinavian Logic Symposium, edited by J.E. Fenstad, pp. 235-307. Natural Deduction. A Proof-Theoretical Study, Uppsala, 1965. "On the Idea of a General Proof Theory," Synthese, Vol. 27, 1974, pp. 63-77. "Philosophical Aspects of Proof Theory," Contemporary Philosophy, Vol. I, ed. by G. Fteistad, pp. 235-277. "Towards a Foundation of a General Proof Theory," Logic, Methodology and Philosophy of Science IV, edited by P. Suppes et ah, pp. 225250. Ribenboim, P. The Book of Prime Number Records, 2nd. edition, New York, 1989.
List of Works Cited
233
Russell, B. and A.N. Whithead. Principia Mathematica, Vol. I, Second Edition, Cambridge, 1927. Schwichtenberg, H. "Proof Theory: Some Applications of Cut-Elimination," Handbook of Mathematical Logic, ed. by J. Barwise, pp. 867-895. Seely, R.A.G. "Weak Adjointness in Proof Theory," Applications of Sheaves, ed. by M.P. Fourman, C. Mulvey and D.S. Scott, pp. 697-701. Seldin, J.P. and J.R. Hindley, eds. To H.B. Curry: essays on combinatory logic. A-calculus and formalism, New York, 1980. Shoesmith, D.J. and T.J. Smiley. Multiple-Conclusion Logic, Cambridge, 1978. Sundholm, G. "Systems of Deduction," Handbook of Philosophical Logic, Vol. I, ed. by Gunthner and Gabbay, pp. 133-188. Suppes, P. et ah, eds. Logic, Methodology and Philosophy of Science IV (proceedings of the fourth international congress for logic, methodology and philosophy of science held at Bucharest in 1971), Amsterdam, 1973. Szabo, M.E. "A Categorical Equivalence of Proofs," Notre Dame Journal of Formal Logic, Vol. 15, 1974, pp. 177-191, and the addendum thereto in Vol. 17, 1976, p. 78. The Algebra of Proofs, Amsterdam, 1978. Tait. W. "Infinitely Long Terms of Transfinite Type," Formal Systems and Recursive Functions, edited by Crossley and Dummett, pp. 176-185. "Intensional Interpretation of Functionals of Finite Type I," The Journal of Symbolic Logic, Vol. 32, 1967, pp. 198-212. Wang, H. Reflections on Kurt Godel, Cambridge, Mass., 1987. Zucker, J.I. "The Correspondence between Cut-Elimination and Normalization," Annals of Mathematical Logic, Vol. 7, 1974, pp. 1-156.
Index composition, 222 computable functionals of finite type, 170 computation rules, 8 congruent quasi-deri vat ions, 82 contractum, 20 convertibility, 158 Copi, I.M. 11 coproduct diagram, 226 crucial elimination, 37 Curry, H.B., 153, 154 CUT, 86 cut-elimination theorem, 9
Ackermann, W., 13, 18 active occurrence, 29 adequate for 11, 62 alike, 83 almost alike, 82 almost cut-free, 35 analytic part, 23 assumption class, 16 Baker, G.P., 170 Begriffschrift, 168 Bernays, P., 171 Black, M., 169 branch of derivation, 72 Brouwer, L.E.J., 1, 163
DT, 107 D'T, 109 definitional equality, 162 degree of a mix, 30 derivation, 5 discrete category, 221 Dragalin, A.G., 43, 193
canonical inference, 157 category, 73 cartesian, 221, 224 co-cartesian, 225 theory, 221 Church, A., 153 Church-Rosser type theorem, 40 classical negation rule, 18 closed assumptions, 15 closed cartesian category, 224 closed instance of an argument, 157 cluster, 69 co-cartesian category, 225 combination of graphs, 80 combinators, 154 compatible quasi-deri vat ions, 83 completeness theorem for typed A-calculus, 183
eigenvariable, 17 elimination, 7, 17 segment, 21 equality of content, 168 equality relation, 8 equi-generality, 220 essential cut, 35 evaluation map, 223 expansion, 184 exponent diagram, 223 exponent map, 223 extensional equality, 172 234
Index extensionality, 173 Feferman, S., 159 Feys, R., 153 finitary mathematics, 171 finite type, 170 Friedman, H., 183 Frisch, J.C., 172 Frege, G., 14, 168 full type-structure, 183 Gddel, K., 153 Geach, P., 169 general proof theory, 39, 178 Gentzen, G., 7 Girard, J.Y., 55, 74, 154 Glivenko, 18 Grundgesetze, 168 Hacker, P., 170 Hauptsatz, 14, 28 Heyting, A., 154, 182 Hilbert, D., 1, 13, 18 Hilbert-style formalization, 13 Howard, W., 154 identity, 3 criteria, 2 morphism, 221 of proofs, 155 immediate subderivation, 194 inductive complexity, 195 inductive derivation, 195 inital subderivation, 112 initial object, 225 injection maps, 225 intensional object, 8, 174 intensionality, 172 introduction, 7, 17 segment, 21 intuitionism, 163 intuitionistic negation rule, 17 inversion principle, 155
Kneale, W., 70 Kreisel, G., 39, 159 A-calculus, 8 A-terms, 177 A/?-conversion, 165 Lambek, J., 220 left rules, 26 Leivant, D., 16 LJ, 27 LJ^\ 52 LJD, 98 LJDT,110
LK, 26 LKD, 98 LKDT,
110
logical operators, 6 Lukasiewicz, J., 12 main routes, 22 major premise, 17 Mann, C., 220 Martin-L6f, P., 8, 45, 154 maximal formula occurrence, 19, 102 maximal segment, 21 meaning, 153 minimal logic, 17 minimal segment, 21 minor premise, 17 mix rule, 28 morphisms of a category, 185 multicategories, 221 multimaps, 221 multiple-conclusion logic, 149 natural deduction, 7 ND, 86 ATJ, 17 Nf, 56 NJ(~V\ 51 ATJ<-V)/, 51 NJD, 97 NJDT,
Jaskowski, S., 12 justification for an inference, 157 justifying operations, 156
110
NK, 17 NK', 150 NKID,
99
NK2D, 99 Kleene, S.C., 39
235
NKDT,
110
236
Normalization,
Cut-Elimination
non-extensional domain, 173 normal form, 8, 20 theorem, 7 for 7VJ, 20 Olson, K.R., 169 open assumption, 15 pairing map, 221 permutative reduction, 21, 118 Plato's Republic, 181 Pottinger, G., 43 power of a formula occurrence, 197 Prawitz, D., 8, 154 primary and secondary qualities, 175 primitive reduction, 193 principle of extensionality, 174 principle of sufficient reason, 164 product diagram, 221 projection map, 222 proof, 5 proof-net, 75 proper parameter, 17, 51 proper reduction sequence, 40, 116, 199 proposition, 2 provable identity, 165 pruning reduction, 120 qualities, 175 quasi-derivation, 81 rank for indexed formulae, 35 rank of a formula, 30 rank of a mix, 30 reasoning, 180 redex, 20 reduce in one step, 20 reduction relations, 7 reduction sequence, 40 regular figure, 195 right rules, 26 routes, 22
and the Theory of Proofs rule of non-constructive dilemma, 150 Russell, B., 11 Schwichtenberg, H., 196 Seely, R.A.G., 227 separability property, 7 sequent calculus, 9, 26 sequential categories, 221 set theory, 6 Shoesmith, D.J., 55 simultaneous substitution, 58 Smiley, T.J., 55 strong cut-elimination, 186 theorem, 40 strong equivalence, 40 strong normalization theorem, 31 strong reduction, 167 strong validity, 157 subderivation, 20, 112 subformula property, 22 substitution, 91 Sundholm, G., 150 synonymy, 164 synthetic part, 23 Szabo, M.E., 220 T, 170 table of development, 70 Tait, W., 8, 154 t-connection, 50 theory of proofs, 5 thinning, 106 permutation, 122 reduction, 119 tree, 8 validity, 156 weak equality, 165 weak reduction, 165 weight of an application of cut, 198 Zucker, J.L, 9
CSLI Publications Lecture Notes T h e titles in this series are distributed by the University of Chicago Press and may be purchased in academic or university bookstores or ordered directly from the distributor: Order Department, 11030 S. Langely Avenue, Chicago, Illinois 60628. A Manual of Intensional Logic. Johan van Benthem, second edition, revised and expanded. Lecture Notes No. 1. ISBN 0-937073-29-6 (paper), 0-93707330-X (cloth) Emotion and Focus. Helen Fay Nissenbaum. Lecture Notes No. 2. ISBN 0-937073-20-2 (paper) Lectures on Contemporary Syntactic Theories. Peter Sells. Lecture Notes No. 3. ISBN 0-937073-14-8 (paper), 0-937073-13-X (cloth)
Prolog and Natural-Language Analysis. Fernando C. N. Pereira and Stuart M. Shieber. Lecture Notes No. 10. ISBN 0-937073-18-0 (paper), 0-937073-17-2 (cloth) Working Papers in Grammatical Theory and Discourse Structure: Interactions of Morphology, Syntax, and Discourse. M. Iida, S. Wechsler, and D. Zee (Eds.) with an Introduction by Joan Bresnan. Lecture Notes No. 11. ISBN 0-937073-04-0 (paper), 0-93707325-3 (cloth) Natural Language Processing in the 1980s: A Bibliography. Gerald Gazdar, Alex Franz, Karen Osborne, and Roger Evans. Lecture Notes No. 12. ISBN 0-937073-28-8 (paper), 0-93707326-1 (cloth) Information-Based Syntax and Semantics. Carl Pollard and Ivan Sag. Lecture Notes No. 13. ISBN 0-93707324-5 (paper), 0-937073-23-7 (cloth)
An Introduction to Unification-Based Approaches to Grammar. Stuart M. Shieber. Lecture Notes No. 4. ISBN 0-937073-00-8 (paper), 0-937073-01-6 (cloth)
Non-Well-Founded Sets. Peter Aczel. Lecture Notes No. 14. ISBN 0-93707322-9 (paper), 0-937073-21-0 (cloth)
The Semantics of Destructive Lisp. Ian A. Mason. Lecture Notes No. 5. ISBN 0-937073-06-7 (paper), 0-937073-05-9 (cloth)
Partiality, Truth and Persistence. Tore Langholm. Lecture Notes No. 15. ISBN 0-937073-34-2 (paper), 0-93707335-0 (cloth)
An Essay on Facts. Ken Olson. Lecture Notes No. 6. ISBN 0-937073-08-3 (paper), 0-937073-05-9 (cloth) Logics of Time and Computation. Robert Goldblatt, second edition, revised and expanded. Lecture Notes No. 7. ISBN 0-937073-94-6 (paper), 0-937073-93-8 (cloth)
Attribute-Value Logic and the Theory of Grammar. Mark Johnson. Lecture Notes No. 16. ISBN 0-937073-36-9 (paper), 0-937073-37-7 (cloth) The Situation in Logic. Jon Barwise. Lecture Notes No. 17. ISBN 0-93707332-6 (paper), 0-937073-33-4 (cloth)
Word Order and Constituent Structure in German. Hans Uszkoreit. Lecture Notes No. 8. ISBN 0-937073-10-5 (paper), 0-937073-09-1 (cloth)
The Linguistics of Punctuation. Geoff Nunberg. Lecture Notes No. 18. ISBN 0-937073-46-6 (paper), 0-937073-47-4 (cloth)
Color and Color Perception: A Study in Anthropocentric Realism. David Russel Hilbert. Lecture Notes No. 9. ISBN 0-937073-16-4 (paper), 0-93707315-6 (cloth)
Anaphora and Quantification in Situation Semantics. Jean Mark Gawron and Stanley Peters. Lecture Notes No. 19. ISBN 0-937073-48-4 (paper), 0-937073-49-0 (cloth)
Propositional Attitudes: The Role of Content in Logic, Language, and Mind, C. Anthony Anderson and Joseph Owens. Lecture Notes No. 20. ISBN 0-937073-50-4 (paper), 0-93707351-2 (cloth) Literature and Cognition. Jerry Ft. Hobbs. Lecture Notes No. 21. ISBN 0-937073-52-0 (paper), 0-937073-53-9 (cloth) Situation Theory and Its Applications, Vol. 1. Robin Cooper, Kuniaki Mukai, and John Perry (Eds.). Lecture Notes No. 22. ISBN 0-937073-54-7 (paper), 0-937073-55-5 (cloth) The Language of First-Order Logic (including the Macintosh program, TarskVs World). Jon Barwise and John Etchemendy, second edition, revised and expanded. Lecture Notes No. 23. ISBN 0-937073-74-1 (paper) Lexical Matters. Ivan A. Sag and Anna Szabolcsi, editors. Lecture Notes No. 24. ISBN 0-937073-66-0 (paper), 0-937073-65-2 (cloth) Tarskifs World. Jon Barwise and John Etchemendy. Lecture Notes No. 25. ISBN 0-937073-67-9 (paper) Situation Theory and Its Applications, Vol. 2. Jon Barwise, J. Mark Gawron, Gordon Plotkin, Syun Tutiya, editors. Lecture Notes No. 26. ISBN 0-93707370-9 (paper), 0-937073-71-7 (cloth) Literate Programming. Donald E. Knuth. Lecture Notes No. 27. ISBN 0-937073-80-6 (paper), 0-937073-81-4 (cloth) Normalization, Cut-Elimination and the Theory of Proofs. A. M. Ungar. Lecture Notes No. 28. ISBN 0-93707382-2 (paper), 0-937073-83-0 (cloth) Lectures on Linear Logic. A. S. Troelstra. Lecture Notes No. 29. ISBN 0-937073-77-6 (paper), 0-937073-78-4 (cloth) A Short Introduction to Modal Logic. Grigori Mints. Lecture Notes No. 30. ISBN 0-937073-75-X (paper), 0-93707376-8 (cloth)
Other CSLI Titles Distributed by UCP Agreement in Natural Language: Approaches, Theories, Descriptions. Michael Barlow and Charles A. Ferguson (Eds.). ISBN 0-937073-02-4 (cloth) Papers from the Second International Workshop on Japanese Syntax. William J. Poser (Ed.). ISBN 0937073-38-5 (paper), 0-937073-39-3 (cloth) The Proceedings of the Seventh West Coast Conference on Formal Linguistics (WCCFL 7). ISBN 0-937073-40-7 (paper) The Proceedings of the Eighth West Coast Conference on Formal Linguistics (WCCFL 8). ISBN 0-937073-45-8 (paper) The Phonology-Syntax Connection. Sharon Inkelas and Draga Zee (Eds.) (co-published with The University of Chicago Press). ISBN 0-226-38100-5 (paper), 0-226-38101-3 (cloth) The Proceedings of the Ninth West Coast Conference on Formal Linguistics (WCCFL 9). ISBN 0-937073-64-4 (paper) Japanese/Korean Linguistics. Hajime Hoji (Ed.). ISBN 0-937073-57-1 (paper), 0-937073-56-3 (cloth) Experiencer Subjects in South Asian Languages. Manindra K. Verm a and K. P. Mohanan (Eds.). ISBN 0937073-60-1 (paper), 0-937073-61-X (cloth) Grammatical Relations: A CrossTheoretical Perspective. Katarzyna Dziwirek, Patrick Farrell, Errapel Mejias Bikandi (Eds.). ISBN 0-93707363-6 (paper), 0-937073-62-8 (cloth) The Proceedings of the Tenth West Coast Conference on Formal Linguistics (WCCFL 10). ISBN 0-937073-79^2 (paper)
Books Distributed by CSLI
Ordering Titles Distributed by CSLI
The Proceedings of the Third West Coast Conference on Formal Linguistics (WCCFL 3). ($10.95) ISBN 0-93707345-8 (paper)
Titles distributed by CSLI may be ordered directly from CSLI Publications, Ventura Hall, Stanford University, Stanford, California 94305-4115 or by phone (415)723-1712 or (415)7231839. Orders can also be placed by email ([email protected]) or FAX (415)723-0758.
The Proceedings of the Fourth West Coast Conference on Formal Linguistics (WCCFL 4). ($11.55) ISBN 0-937073-45-8 (paper) The Proceedings of the Fifth West Coast Conference on Formal Linguistics (WCCFL 5). ($10.95) ISBN 0-93707345-8 (paper) The Proceedings of the Sixth West Coast Conference on Formal Linguistics (WCCFL 6). ($13.95) ISBN 0-93707345-8 (paper) Hausar Yau Da Kullum: Intermediate and Advanced Lessons in Hausa Language and Culture. William R. Leben, Ahmadu Bello Zaria, Shekarau B. Maikafi, and Lawan Danladi Yalwa. ($19.95) ISBN 0-937073-68-7 (paper) Hausar Yau Da Kullum Workbook. William R. Leben, Ahmadu Bello Zaria, Shekarau B. Maikafi, and Lawan Danladi Yalwa. ($7.50) ISBN 0-93703-69-5 (paper)
All orders must be prepaid by check, VISA, or MasterCard (include card name, number, expiration date). For shipping and handling add $2.50 for first book and $0.75 for each additional book; $1.75 for the first report and $0.25 for each additional report. California residents add 7% sales tax. For overseas shipping, add $4.50 for first book and $2.25 for each additional book; $2.25 for first report and $0.75 for each additional report. All payments must be made in US currency.