Why implication is so weird?

Everyone who learned formal logic knows the truth table for implication:

\(A\)\(B\)\(A \Rightarrow B\)
001
011
100
111

The last two rows don’t raise any concerns, but the first two say that anything follows from a false statement. According to this table, the following statements are all true:

  1. 2 + 2 = 4 follows from 2 + 2 = 5
  2. The Earth existed for not more than 10 years, so 2 + 2 =5
  3. The Sun is black, so it follows that the Sun is red

All of these are obviously absurd, but professors keep teaching it at universities. I’ll try to explain why. I am not getting into much of the details and sometimes take a very liberal approach to the terminology, looking more for reader’s intuition rather than formal completeness. Hopefully, it wouldn’t become an obstacle and the reader would benefit from such informal reading about different views on this topic.

(Non-) explanation 1: By definition
You can make up an arbitrary binary operation (there are just 16 of them in classical logic) and call it whatever you want. This would be a valid definition which could be worked with. After all, any definition is nothing more than an agreement in terminology and this is the most common “explanation” which people hear in universities.

Even though such explanation is perfectly valid, it doesn’t give you much. Why exactly was this definition chosen among 16 possibilities and why is it called “implication”? It is still unclear. You also should keep in mind that if a teacher tells you that implication is just an operation according to the truth table, you are not in a position to derive paradoxes like the ones above since the connection between the truth table and the meaning of natural language is not defined with this approach.

(Non-) explanation 2: Paradoxical part doesn’t matter
I’ve seen this explanation in several lecture notes published online. If we apply implication to deduce true statements from true statements, we are fine. On the other hand, if we try to deduce something from false statements, we are in trouble anyway, since we assume something which is not true. So, the part of the truth table for false assumptions doesn’t matter.

This explanation is not entirely wrong, but it misses one important point: the paradoxical implication is used in mathematics, even though in most cases not directly. Let’s see how it appears in practice.

In calculus we define an open set as a set \(A\), such that for any \(x \in A\) we have some neighborhood \(N\) of \(x\), which lies within \(A\). We also treat the empty set as open and we can understand why using paradoxical implication. This is how we write that \(A\) is open formally (for simplicity I am skipping a formal definition of “neighborhood” here):

$$x \in A \Rightarrow \exists N. x \in N \land N \subset A$$

If we substitute the empty set for \(A\), we see that this property is true by paradoxical implication, since \(x \in \emptyset\) is always false. So the fact that empty set is open is not part of the definition, but is a theorem. This statement may look innocent, but it is critical. If we didn’t assume paradoxical implication and decided to treat the empty set as non-open, the whole theory would break. For example, we can prove that the intersection of two open sets is open, but if we take two open non-intersecting sets, their intersection is empty, so the empty set must be open, otherwise, we get a contradiction.

For another example where the paradoxical implication is used in mathematical proof directly, you can check Godel’s ontological argument, explained in this blog recently (see the proof of theorem 1).

As a deeper example, consider proofs by contradiction in general. Let’s say that we proved that \(B\) follows from \(A\). We also proved that \(B\) is false. We conclude that \(A\) must be false as well, since otherwise, we get that \(B\) is true (as we said, it follows from \(A\)), contrary to what we know. This kind of reasoning is common in mathematics, but how can we prove that proofs by contradiction do actually work?

It appears that proof by contradiction depends directly on paradoxical implication. We’ll make use of a logical rule called case analysis: if we proved that \(A\Rightarrow C\) and \(B\Rightarrow C\) and we know that \(A\lor B\) is true, then \(C\) is true as well. It is quite intuitive if you meditate on this for a minute or two, so I am not getting into details here. We consider this property as an axiom.

Let’s say that we proved \(A\Rightarrow B\) and \(\neg B\) (which means that \(B\) is false). We assume that all statements are either true or false, so \(A\lor\neg A\). We’ll apply case analysis to derive \(\neg A\) from here. First of all, we trivially have \(\neg A\Rightarrow \neg A\). If we assume \(A\) instead, then by \(A\Rightarrow B\) we know that \(B\) is true, but it is a contradiction, so by paradoxical implication we can derive anything from here, including \(\neg A\). So we got that \(\neg A\) follows from both \(A\) and \(\neg A\) and by case analysis we conclude \(\neg A\).

This is how proof by contradiction works under the hood. Other interpretations of this rule are also possible, but this one is the most common and I hope sufficient as an example.

Explanation 3: Logical necessity
There are a number of properties we want the implication operation to follow. For example, if \(A\Rightarrow B\) and \(B\Rightarrow C \) then we expect \(A\Rightarrow C\). This property is sometimes called transitivity and we can write it as

$$((A \Rightarrow B) \land (B \Rightarrow C)) \Rightarrow (A \Rightarrow C)$$

We consider this formula as an axiom, so it must be true irrespectively of the truth values of \(A, B, C\). Assume now that we already know the truth table of implication for the true assumption and we want to fill in the remaining entries of the table:

\(A\)\(B\)\(A \Rightarrow B\)
00?
01?
100
111

If we substitute 1 for \(A\) and \(C\) and 0 for \(B\) in transitivity axiom, we get the following formula

$$((1 \Rightarrow 0) \land (0 \Rightarrow 1)) \Rightarrow (1 \Rightarrow 1)$$

Using truth table for known values and taking into account that \(X\land 0 = 0\) for all \(X\), we can rewrite this statement as \(0 \Rightarrow 1\). Since we assume transitivity as an axiom, this final stament must be true.

Similarly, if in the example above we take \(C\) as false, we get that \(0 \Rightarrow 0\) is also true. So by demanding that implication has to be transitive, we were able to fill in the truth table for implication. Any other truth table would falsify transitivity property.

Transitivity is not the only property which forces the truth table to be what it is. As an even simpler example, we can consider the property

$$\neg (A \Rightarrow B) \Leftrightarrow (A \land \neg B)$$

This equivalence is also what we naturally expect from an operation called “implication”: we are sure that there is no implication exactly when we can provide an example for a truthful assumption and a false consequence.

Explanation 4: By definition II

So what we get is that the truth table for implication is forced by the properties which we want it to have, but it is only half of the story. The previous explanation makes a false impression that paradoxical implication is forced by the fact that any statement can be either true or false. It is called classical assumption, but it is not as obvious as you might think. There are plenty of different kinds of logics which have either more truth values or don’t make any assumptions about truthfulness at all.

If we don’t assume that everything is true or false, we cannot really have a truth table, but we still have alternatives. Before I present them, I have to explain how logical systems are defined in general.

The science of logic has two basic purposes. The first one is to provide a language which captures only a “logical essence” of statements without attaching any particular meaning to them. The second is to give rules according to which statements can be derived from each other. Consider a classical example:

  • All humans are mortal
  • Socrates is a human
  • Therefore, Socrates is mortal

This is trivial logical reasoning, but it is quite a general one. We can apply the same logic to a completely different problem domain:

  • All cats are beautiful
  • Tom is a cat
  • Therefore, Tom is beautiful

The situation is different, but you may notice that the reasoning is exactly the same. We can apply it even to absurd statements:

  • All quipee are totoked of unkes
  • Kupe is quipee
  • Therefore, kupe is totoked of unkes

We don’t understand what “quipee”, “unkes” and “kupe” are, but we can see that at least the reasoning is correct if such things ever exist. Logic science abstracts away from the specific meaning of the phrases and allows us to concentrate only on the derivations of one statement from another. Using formulas the same deduction can be written as

  • \(P(x) \Rightarrow Q(x)\)
  • \(P(a)\)
  • Therefore, \(Q(a)\)

Here instead of sentences of language we use formulas, which lack any meaning, but which still allow us to capture the pattern of reasoning.

The simplest logical system known to me is the one which is taught in universities and which is called “propositional calculus”. The syntax of its formulas is very simple. A formula can be one of the three:

  1. A letter: A, B, etc. Letters symbolize logical statements, the exact meaning of which is not important.
  2. Expression \((A) \land (B)\), where \(A\) and \(B\) are formulas.
  3. Expression \(\neg (A)\), where \(A\) is a formula.

Brackets around \(A\) and \(B\) above allow to parse formulas unambiguously. For example, expression \(\neg ((A) \land (B))\) is precisely defined, while formula \(\neg A \land B\) can be interpreted either as \(\neg (A \land B)\) or as \((\neg A)\land B\). In practice, we often omit excessive brackets and have priority rules for logical operations, but from the formal point of view, it is not needed.

For some expressions we define shorthands. As the first example we can define \(A\lor B\) to be a shorthand for expression \(\neg((\neg A)\land(\neg B))\). If you meditate about it for a moment, you’ll see that it matches our intuitive understanding of operation “or”. To say “A or B” is the same as to say “It is not true that both A and B are false”. The difference with truth table approach here is that we don’t actually assign any truth value to operation \(\land\), we only say that it is a shorthand.

In the same way, we can say that \(A\Rightarrow B\) is just a shorthand for \(B\lor\neg A\). Once again, we don’t assume that every statement is true or false and we don’t define the truth table, we just say that \(\Rightarrow\) is a shorthand for a particular syntax construct, which happens to intuitively match our definition of “implication”. To say that “A implies B” is the same as to say that “either B is right, or A cannot be right”.

At this point, if we wish, we can introduce truth tables. We only need to define the truth table for initial operations “and” and “not” and from there we can conclude what is the truth table of implication. As a non-classical example, consider Kleene’s ternary logic, where apart from “true” and “false” we also have logical value “unknown”. In the table below “and” and “not” operations are considered given, while truth values for “or” and “implication” are derived using the shorthand definition:

AB\(A\land B\)\(\neg A\)\(A\lor B\)\(A\Rightarrow B\)
FFFTFT
FUFTUT
FTFTTT
UFFUUU
UUUUUU
UTUUTT
TFFFTF
TUUFTU
TTTFTT

Another example is fuzzy logic. In this logic, statements can be “partially true”, which is represented by logical values in range \(\left[0;1\right]\). Logical “and” in this case is defined as an operation of taking the minimum value among two reals: \(A\land B = \min (A, B)\). Operation “not” is defined as \(\neg A = 1 – A\).

Using the shorthand rule given above, we can now conclude that operation “or” is given by formula
$$A\lor B = 1 – \min(1 – A, 1 – B) = \max(A, B)$$
and implication by formula
$$A\Rightarrow B = max(1 – A, B)$$
You can see that if \(A\) is false (its value equals 0), then the implication is true (value equals 1). At the same time if \(A\) is true, then the value of the implication equals the value of \(B\).

There is a caveat, though. In classical logic, we can say that
$$A\Rightarrow B = (\neg A)\lor B = (\neg A)\lor (A \land B)$$
but if we use the latest expression as the definition of implication in fuzzy logic, we get a different formula (I use prime sign \(\prime\) to distinguish it from the previous definition):
$$A\Rightarrow’ B = \max(1 – A, \min(A, B))$$

Although this formula gives the same results in case if \(A\) is 0 or 1, for other values it is not equivalent. If we take, for example, \(A=0.5\) and \(B=0.75\), then according to our formulas
$$A \Rightarrow B = 0.75$$
$$A \Rightarrow’ B = 0.5$$

Which formula is correct? It is impossible to say. We must admit that the choice to define implication as \((\neg A)\lor B\) is quite arbitrary. It works for some logics, but doesn’t work for others and sometimes several different definitions are possible.

Explanation 5: Derivation rules

In the previous explanation, we mentioned that logic consists of two parts: syntax and derivations. We explained how an implication can be treated just as a shorthand for a logical formula. Here we’ll show that it also matches our intuitive idea of derivation.

Let’s introduce some notation. If formula \(B\) can be derived from formulas \(A_1,\ldots,A_n\), we denote it as \(A_1,\ldots,A_n\vdash B\). For our purposes we will need only four derivation rules:

  1. \(A\land B \vdash A\),
  2. \(A \land B \vdash B\),
  3. \(A \vdash A\lor B \), for arbitrary \(B\),
  4. \(A\lor B, \neg A \vdash B \).

You can easily check that these rules are true for the classical logic, but in fact we expect these rules to be valid for any kind of logic, not only the classical.

We want to define implication in such a way that \(A \Rightarrow B\) if and only if we can derive \(B\) from assumption \(A\). We can formulate it as a rule:

5. If \(A \vdash B\), then \(\vdash A \Rightarrow B\) (no assumption to the left of \(\vdash\) here, in other words if we need assumption \(A\) to derive \(B\), then we can derive \(A \Rightarrow B\) without assumption \(A\)).

Let \(\bot\) denote a statement which cannot be true:

6. \(\bot = A \land (\neg A)\) for arbitrary \(A\).

From here we can derive an interesting property of implication. Let’s take some arbitrary statement \(B\). Then

  1. Let’s assume \(\bot\).
  2. From assumption 1 we derive \(A\) by applying rule 1 to the definition of \(\bot\).
  3. From assumption 1 we derive \(\neg A\) by applying rule 2 to the definition of \(\bot\).
  4. From statement 2 we derive \(A\lor B\) by rule 3.
  5. Applying rule 4 to statements 3 and 4, we conclude \(B\).

So we managed to derive \(B\) from \(\bot\), which means that \(\bot \Rightarrow B\) by rule 5. It is exactly what we were looking for: any statement follows from absurd, but this time we managed to show this based only on very generic derivation rules and not any particular kind of logic. This rule is true even if we don’t assume that all of the statements are either true or false.

Explanation 6: Models

Let’s say that the universe is entirely described by a number of logical variables. For example, let’s say that just three variables \(A, B, C\) are enough. Each particular assignment of values to these logical variables gives us a particular instantiation of the universe. We call such instantiations models. In the example with three variables we will have 8 different models of the universe.

Now let’s say that we know some facts about the universe, which we are going to call axioms. For our three-variable world consider the following two axioms:

  1. \(A\lor B\)
  2. \(B = C\)

These axioms put restrictions on what the universe can be, so we have to throw away some of the models. In fact, after introducing these two axioms we have just three possible models remaining:

  1. \(A = 1, B = 0, C = 0\)
  2. \(A = 0, B = 1, C = 1\)
  3. \(A = 1, B = 1, C = 1\)

All other models are invalid. For instance, model
$$A = 0, B = 0, C = 0$$
is invalid because it doesn’t match the first axiom, and model
$$A = 1, B = 0, C = 1$$
invalidates the second axiom.

Now we can define implication:

Definition. \(A\Rightarrow B\), if and only if in any model where \(A\) is true, \(B\) is also true.

If we add axiom \(A\Rightarrow B\) to the set of axioms which we had before, the list of valid models narrows down to just two of them:

  1. \(A = 0, B = 1, C = 1\)
  2. \(A = 1, B = 1, C = 1\)

You can see from this example, that in case of axiom \(A\Rightarrow B\), if \(A\) is true in some model, then \(B\) must be also true. At the same time, if \(A\) is false, then the constraint which this axiom impose is not applied, although we still claim that the axiom is correct. Since \(A\Rightarrow B\) is an axiom, it must be true in all models, including the models where the constraint is not applied. Thus, we conclude that \(A\Rightarrow B\) is true when \(A\) is false.

Conclusion

There are many other ways of looking at the implication. Some of them are purely philosophical, others involve more advanced mathematics. Hopefully, the approaches which I demonstrated shed some light on this topic and now the reader understands it a little bit better, although, admittedly, many of the explanations are just manipulations with formulas, which don’t give much intuition. The implication looks simple when you don’t dig deep, but when you try to formalize it, it is getting very counterintuitive due to its nature and I doubt that it is possible to make the implication look any more natural.

Leave a Reply

Your email address will not be published. Required fields are marked *