Managers of the round table

This is an old but good puzzle, which I personally enjoyed solving quite a while ago.

Seats of the round table are all labeled with the names of N managers. All managers have chosen the wrong seat. Prove that you can always rotate the table so that at least two managers sit in the positions with the correct names.

This problem is quite simple, so I don’t provide the solution, but feel free to share it in comments. As a hint, I can say that the whole solution shouldn’t be longer that one or two sentences.

Why implication is so weird?

Everyone who learned formal logic knows the truth table for implication:

\(A\)\(B\)\(A \Rightarrow B\)

The last two rows don’t raise any concerns, but the first two say that anything follows from a false statement. According to this table, the following statements are all true:

  1. 2 + 2 = 4 follows from 2 + 2 = 5
  2. The Earth existed for not more than 10 years, so 2 + 2 =5
  3. The Sun is black, so it follows that the Sun is red

All of these are obviously absurd, but professors keep teaching it at universities. I’ll try to explain why. I am not getting into much of the details and sometimes take a very liberal approach to the terminology, looking more for reader’s intuition rather than formal completeness. Hopefully, it wouldn’t become an obstacle and the reader would benefit from such informal reading about different views on this topic.

(Non-) explanation 1: By definition
You can make up an arbitrary binary operation (there are just 16 of them in classical logic) and call it whatever you want. This would be a valid definition which could be worked with. After all, any definition is nothing more than an agreement in terminology and this is the most common “explanation” which people hear in universities.

Even though such explanation is perfectly valid, it doesn’t give you much. Why exactly was this definition chosen among 16 possibilities and why is it called “implication”? It is still unclear. You also should keep in mind that if a teacher tells you that implication is just an operation according to the truth table, you are not in a position to derive paradoxes like the ones above since the connection between the truth table and the meaning of natural language is not defined with this approach.

(Non-) explanation 2: Paradoxical part doesn’t matter
I’ve seen this explanation in several lecture notes published online. If we apply implication to deduce true statements from true statements, we are fine. On the other hand, if we try to deduce something from false statements, we are in trouble anyway, since we assume something which is not true. So, the part of the truth table for false assumptions doesn’t matter.

This explanation is not entirely wrong, but it misses one important point: the paradoxical implication is used in mathematics, even though in most cases not directly. Let’s see how it appears in practice.

In calculus we define an open set as a set \(A\), such that for any \(x \in A\) we have some neighborhood \(N\) of \(x\), which lies within \(A\). We also treat the empty set as open and we can understand why using paradoxical implication. This is how we write that \(A\) is open formally (for simplicity I am skipping a formal definition of “neighborhood” here):

$$x \in A \Rightarrow \exists N. x \in N \land N \subset A$$

If we substitute the empty set for \(A\), we see that this property is true by paradoxical implication, since \(x \in \emptyset\) is always false. So the fact that empty set is open is not part of the definition, but is a theorem. This statement may look innocent, but it is critical. If we didn’t assume paradoxical implication and decided to treat the empty set as non-open, the whole theory would break. For example, we can prove that the intersection of two open sets is open, but if we take two open non-intersecting sets, their intersection is empty, so the empty set must be open, otherwise, we get a contradiction.

For another example where the paradoxical implication is used in mathematical proof directly, you can check Godel’s ontological argument, explained in this blog recently (see the proof of theorem 1).

As a deeper example, consider proofs by contradiction in general. Let’s say that we proved that \(B\) follows from \(A\). We also proved that \(B\) is false. We conclude that \(A\) must be false as well, since otherwise, we get that \(B\) is true (as we said, it follows from \(A\)), contrary to what we know. This kind of reasoning is common in mathematics, but how can we prove that proofs by contradiction do actually work?

It appears that proof by contradiction depends directly on paradoxical implication. We’ll make use of a logical rule called case analysis: if we proved that \(A\Rightarrow C\) and \(B\Rightarrow C\) and we know that \(A\lor B\) is true, then \(C\) is true as well. It is quite intuitive if you meditate on this for a minute or two, so I am not getting into details here. We consider this property as an axiom.

Let’s say that we proved \(A\Rightarrow B\) and \(\neg B\) (which means that \(B\) is false). We assume that all statements are either true or false, so \(A\lor\neg A\). We’ll apply case analysis to derive \(\neg A\) from here. First of all, we trivially have \(\neg A\Rightarrow \neg A\). If we assume \(A\) instead, then by \(A\Rightarrow B\) we know that \(B\) is true, but it is a contradiction, so by paradoxical implication we can derive anything from here, including \(\neg A\). So we got that \(\neg A\) follows from both \(A\) and \(\neg A\) and by case analysis we conclude \(\neg A\).

This is how proof by contradiction works under the hood. Other interpretations of this rule are also possible, but this one is the most common and I hope sufficient as an example.

Explanation 3: Logical necessity
There are a number of properties we want the implication operation to follow. For example, if \(A\Rightarrow B\) and \(B\Rightarrow C \) then we expect \(A\Rightarrow C\). This property is sometimes called transitivity and we can write it as

$$((A \Rightarrow B) \land (B \Rightarrow C)) \Rightarrow (A \Rightarrow C)$$

We consider this formula as an axiom, so it must be true irrespectively of the truth values of \(A, B, C\). Assume now that we already know the truth table of implication for the true assumption and we want to fill in the remaining entries of the table:

\(A\)\(B\)\(A \Rightarrow B\)

If we substitute 1 for \(A\) and \(C\) and 0 for \(B\) in transitivity axiom, we get the following formula

$$((1 \Rightarrow 0) \land (0 \Rightarrow 1)) \Rightarrow (1 \Rightarrow 1)$$

Using truth table for known values and taking into account that \(X\land 0 = 0\) for all \(X\), we can rewrite this statement as \(0 \Rightarrow 1\). Since we assume transitivity as an axiom, this final stament must be true.

Similarly, if in the example above we take \(C\) as false, we get that \(0 \Rightarrow 0\) is also true. So by demanding that implication has to be transitive, we were able to fill in the truth table for implication. Any other truth table would falsify transitivity property.

Transitivity is not the only property which forces the truth table to be what it is. As an even simpler example, we can consider the property

$$\neg (A \Rightarrow B) \Leftrightarrow (A \land \neg B)$$

This equivalence is also what we naturally expect from an operation called “implication”: we are sure that there is no implication exactly when we can provide an example for a truthful assumption and a false consequence.

Explanation 4: By definition II

So what we get is that the truth table for implication is forced by the properties which we want it to have, but it is only half of the story. The previous explanation makes a false impression that paradoxical implication is forced by the fact that any statement can be either true or false. It is called classical assumption, but it is not as obvious as you might think. There are plenty of different kinds of logics which have either more truth values or don’t make any assumptions about truthfulness at all.

If we don’t assume that everything is true or false, we cannot really have a truth table, but we still have alternatives. Before I present them, I have to explain how logical systems are defined in general.

The science of logic has two basic purposes. The first one is to provide a language which captures only a “logical essence” of statements without attaching any particular meaning to them. The second is to give rules according to which statements can be derived from each other. Consider a classical example:

  • All humans are mortal
  • Socrates is a human
  • Therefore, Socrates is mortal

This is trivial logical reasoning, but it is quite a general one. We can apply the same logic to a completely different problem domain:

  • All cats are beautiful
  • Tom is a cat
  • Therefore, Tom is beautiful

The situation is different, but you may notice that the reasoning is exactly the same. We can apply it even to absurd statements:

  • All quipee are totoked of unkes
  • Kupe is quipee
  • Therefore, kupe is totoked of unkes

We don’t understand what “quipee”, “unkes” and “kupe” are, but we can see that at least the reasoning is correct if such things ever exist. Logic science abstracts away from the specific meaning of the phrases and allows us to concentrate only on the derivations of one statement from another. Using formulas the same deduction can be written as

  • \(P(x) \Rightarrow Q(x)\)
  • \(P(a)\)
  • Therefore, \(Q(a)\)

Here instead of sentences of language we use formulas, which lack any meaning, but which still allow us to capture the pattern of reasoning.

The simplest logical system known to me is the one which is taught in universities and which is called “propositional calculus”. The syntax of its formulas is very simple. A formula can be one of the three:

  1. A letter: A, B, etc. Letters symbolize logical statements, the exact meaning of which is not important.
  2. Expression \((A) \land (B)\), where \(A\) and \(B\) are formulas.
  3. Expression \(\neg (A)\), where \(A\) is a formula.

Brackets around \(A\) and \(B\) above allow to parse formulas unambiguously. For example, expression \(\neg ((A) \land (B))\) is precisely defined, while formula \(\neg A \land B\) can be interpreted either as \(\neg (A \land B)\) or as \((\neg A)\land B\). In practice, we often omit excessive brackets and have priority rules for logical operations, but from the formal point of view, it is not needed.

For some expressions we define shorthands. As the first example we can define \(A\lor B\) to be a shorthand for expression \(\neg((\neg A)\land(\neg B))\). If you meditate about it for a moment, you’ll see that it matches our intuitive understanding of operation “or”. To say “A or B” is the same as to say “It is not true that both A and B are false”. The difference with truth table approach here is that we don’t actually assign any truth value to operation \(\land\), we only say that it is a shorthand.

In the same way, we can say that \(A\Rightarrow B\) is just a shorthand for \(B\lor\neg A\). Once again, we don’t assume that every statement is true or false and we don’t define the truth table, we just say that \(\Rightarrow\) is a shorthand for a particular syntax construct, which happens to intuitively match our definition of “implication”. To say that “A implies B” is the same as to say that “either B is right, or A cannot be right”.

At this point, if we wish, we can introduce truth tables. We only need to define the truth table for initial operations “and” and “not” and from there we can conclude what is the truth table of implication. As a non-classical example, consider Kleene’s ternary logic, where apart from “true” and “false” we also have logical value “unknown”. In the table below “and” and “not” operations are considered given, while truth values for “or” and “implication” are derived using the shorthand definition:

AB\(A\land B\)\(\neg A\)\(A\lor B\)\(A\Rightarrow B\)

Another example is fuzzy logic. In this logic, statements can be “partially true”, which is represented by logical values in range \(\left[0;1\right]\). Logical “and” in this case is defined as an operation of taking the minimum value among two reals: \(A\land B = \min (A, B)\). Operation “not” is defined as \(\neg A = 1 – A\).

Using the shorthand rule given above, we can now conclude that operation “or” is given by formula
$$A\lor B = 1 – \min(1 – A, 1 – B) = \max(A, B)$$
and implication by formula
$$A\Rightarrow B = max(1 – A, B)$$
You can see that if \(A\) is false (its value equals 0), then the implication is true (value equals 1). At the same time if \(A\) is true, then the value of the implication equals the value of \(B\).

There is a caveat, though. In classical logic, we can say that
$$A\Rightarrow B = (\neg A)\lor B = (\neg A)\lor (A \land B)$$
but if we use the latest expression as the definition of implication in fuzzy logic, we get a different formula (I use prime sign \(\prime\) to distinguish it from the previous definition):
$$A\Rightarrow’ B = \max(1 – A, \min(A, B))$$

Although this formula gives the same results in case if \(A\) is 0 or 1, for other values it is not equivalent. If we take, for example, \(A=0.5\) and \(B=0.75\), then according to our formulas
$$A \Rightarrow B = 0.75$$
$$A \Rightarrow’ B = 0.5$$

Which formula is correct? It is impossible to say. We must admit that the choice to define implication as \((\neg A)\lor B\) is quite arbitrary. It works for some logics, but doesn’t work for others and sometimes several different definitions are possible.

Explanation 5: Derivation rules

In the previous explanation, we mentioned that logic consists of two parts: syntax and derivations. We explained how an implication can be treated just as a shorthand for a logical formula. Here we’ll show that it also matches our intuitive idea of derivation.

Let’s introduce some notation. If formula \(B\) can be derived from formulas \(A_1,\ldots,A_n\), we denote it as \(A_1,\ldots,A_n\vdash B\). For our purposes we will need only four derivation rules:

  1. \(A\land B \vdash A\),
  2. \(A \land B \vdash B\),
  3. \(A \vdash A\lor B \), for arbitrary \(B\),
  4. \(A\lor B, \neg A \vdash B \).

You can easily check that these rules are true for the classical logic, but in fact we expect these rules to be valid for any kind of logic, not only the classical.

We want to define implication in such a way that \(A \Rightarrow B\) if and only if we can derive \(B\) from assumption \(A\). We can formulate it as a rule:

5. If \(A \vdash B\), then \(\vdash A \Rightarrow B\) (no assumption to the left of \(\vdash\) here, in other words if we need assumption \(A\) to derive \(B\), then we can derive \(A \Rightarrow B\) without assumption \(A\)).

Let \(\bot\) denote a statement which cannot be true:

6. \(\bot = A \land (\neg A)\) for arbitrary \(A\).

From here we can derive an interesting property of implication. Let’s take some arbitrary statement \(B\). Then

  1. Let’s assume \(\bot\).
  2. From assumption 1 we derive \(A\) by applying rule 1 to the definition of \(\bot\).
  3. From assumption 1 we derive \(\neg A\) by applying rule 2 to the definition of \(\bot\).
  4. From statement 2 we derive \(A\lor B\) by rule 3.
  5. Applying rule 4 to statements 3 and 4, we conclude \(B\).

So we managed to derive \(B\) from \(\bot\), which means that \(\bot \Rightarrow B\) by rule 5. It is exactly what we were looking for: any statement follows from absurd, but this time we managed to show this based only on very generic derivation rules and not any particular kind of logic. This rule is true even if we don’t assume that all of the statements are either true or false.

Explanation 6: Models

Let’s say that the universe is entirely described by a number of logical variables. For example, let’s say that just three variables \(A, B, C\) are enough. Each particular assignment of values to these logical variables gives us a particular instantiation of the universe. We call such instantiations models. In the example with three variables we will have 8 different models of the universe.

Now let’s say that we know some facts about the universe, which we are going to call axioms. For our three-variable world consider the following two axioms:

  1. \(A\lor B\)
  2. \(B = C\)

These axioms put restrictions on what the universe can be, so we have to throw away some of the models. In fact, after introducing these two axioms we have just three possible models remaining:

  1. \(A = 1, B = 0, C = 0\)
  2. \(A = 0, B = 1, C = 1\)
  3. \(A = 1, B = 1, C = 1\)

All other models are invalid. For instance, model
$$A = 0, B = 0, C = 0$$
is invalid because it doesn’t match the first axiom, and model
$$A = 1, B = 0, C = 1$$
invalidates the second axiom.

Now we can define implication:

Definition. \(A\Rightarrow B\), if and only if in any model where \(A\) is true, \(B\) is also true.

If we add axiom \(A\Rightarrow B\) to the set of axioms which we had before, the list of valid models narrows down to just two of them:

  1. \(A = 0, B = 1, C = 1\)
  2. \(A = 1, B = 1, C = 1\)

You can see from this example, that in case of axiom \(A\Rightarrow B\), if \(A\) is true in some model, then \(B\) must be also true. At the same time, if \(A\) is false, then the constraint which this axiom impose is not applied, although we still claim that the axiom is correct. Since \(A\Rightarrow B\) is an axiom, it must be true in all models, including the models where the constraint is not applied. Thus, we conclude that \(A\Rightarrow B\) is true when \(A\) is false.


There are many other ways of looking at the implication. Some of them are purely philosophical, others involve more advanced mathematics. Hopefully, the approaches which I demonstrated shed some light on this topic and now the reader understands it a little bit better, although, admittedly, many of the explanations are just manipulations with formulas, which don’t give much intuition. The implication looks simple when you don’t dig deep, but when you try to formalize it, it is getting very counterintuitive due to its nature and I doubt that it is possible to make the implication look any more natural.

Unexpected expected value

Let’s say that random variables \(x_i\) are distributed uniformly on segment \([0;1]\). How many such variables we have to add to get a sum bigger than 1? Formally, if \(n\) is a minimum value such that \(\sum_{i=1}^{n}x_i > 1\), what is the expected value of \(n\)? Surprisingly, it is \(e\).

Let \(m\) be a natural number. We denote the probability that we need more than \(m\) variables so that their sum exceeds 1 as \(p_n\). Geometrically, this probability is just the volume of an \(n\)-dimensional simplex \(X = \{x | \sum_{i=1}^n x_i < 1\}\). In particular, \(p_0 = 1\). This volume can be calculated directly, but there is an elegant probabilistic proof which I’ll demonstrate.

Consider the random variables

$$\begin{eqnarray}y_1 &=& x_1\\ y_2 &=& x_1 + x_2\\ y_3 &=& x_1 + x_2 + x_3\\ \ldots\\ y_m &=& x_1 + x_2 + \ldots + x_m \end{eqnarray}$$

It is obvious that this sequence is increasing and that \(x\in X\) iff \(y \in Y = \{0 \le y_1 \le y_2 \le \ldots \le y_m \le 1\}\). The linear operator which takes \(\{x_i\}\) to \(\{y_i\}\) has determinant 1, so the volumes of \(X\) and \(Y\) are exactly the same.

The volume of \(Y\) is easily calculated. It is the probability that \(m\) random values would appear in a sorted order, which is \(\frac{1}{m!}\). We conclude that \(p_m = \frac{1}{m!}\).

The probability that the sum of the first \(m\) variables exceeds 1 is \(q_m = 1 – p_m\), so the probability that we need exactly \(m\) variables to exceed 1 is given by \(q_m – q_{m-1} = p_{m-1} – p_m\). The expected number of variables which we have to add is then

$$E\left[n\right] = \sum_{m=1}^\infty m (p_{m-1} – p_m)$$

We can get an easier equivalent expression if we consider how we can rearrange a partial sum:

$$\begin{eqnarray} \sum_{m=1}^k m(p_{m-1} – p_m) &=& p_0 – p_1 + 2p_1 – 2p_2 + 3p_2 – 3p_3 + \ldots + kp_{k-1} – kp_k\\ &=& p_0 + p_1 + p_2 + \ldots + p_{k-1} – kp_k\\ &=& \left(\sum_{m=0}^{k-1} \frac{1}{m!}\right) – kp_k \end{eqnarray}$$

Since \(kp_k = \frac{k}{k!} \to 0 \), we conclude that

$$E\left[n\right] = \sum_{m=0}^\infty \frac{1}{m!} = e$$

Arbitrary distribution using biased coin

Problem. Let’s assume that you have a biased coin, but you don’t know the probabilities of its sides. Using the coin, how would you choose one of the numbers 1, 2 or 3 with equal probability?

Stop reading here and try solving the problem yourself before moving on.

Solution. First of all, I have to note that this problem have a trivial solution that works only for this particular question, not in a general case. One such solution was proposed by a reader in the comments section. In this post, we present a solution that will work for arbitrary distributions, no matter how asymmetric they are.

The problem becomes easier if we split it into two parts. Firstly we will learn how to emulate a fair coin using a biased one. After this we describe a procedure which allows us to get numbers from an arbitrary probabilistic distribution.

Let’s denote the probability of tails as \(p\). Then if we toss the coin twice, we can get one of the four outcomes:

  1. Tails both times, with probability \(p^2\),
  2. Heads both times, with probability \((1-p)^2\),
  3. Tails first time, heads second time, probability \(p(1-p)\).
  4. Heads first time, tails second time, probability \(p(1-p)\).

As you see, outcomes 3 and 4 have equal chances to appear. Thus, in order to emulate an experiment with probability 50/50 (fair coin), we can toss the coin twice. If both tosses give the same outcome, then discard it and toss again. When we get tails in one toss and heads in the other, we choose the outcome which comes first.

Example. Consider a sequence of tosses T, T, H, H, T, H, where I denoted tails as T and heads as H. This sequence describes an experiment in which we toss the coin twice and both times we get tails. We discard this result and toss twice again. This time we get two heads. Now, we toss twice one more time. First, we get tails and then heads. We write “tails” as our outcome.

Since now we can assume that we have a fair coin. We will be repeating tosses sequentially and interpreting them as digits in a binary representation of a number \(s\) in interval \([0, 1]\):

$$s=\sum\limits_{i=1}^{+\infty} \frac{t_i}{2^i}$$

Here \(t_i\) denotes the outcome of i-th toss and can be either 0 or 1. In order to get one of the three values at random, we split \([0,1]\) into three subintervals \([0, 1/3), [1/3, 2/3], (2/3, 1]\). (Since particular points like 1/3 has zero probability, we don’t care into which interval we include them). The outcome of the experiment is the number of the subinterval in which \(s\) lies. Since we assume that each digit of \(s\) is uniform, \(s\) itself is uniform in \([0,1]\).

In fact we don’t need the exact value of \(s\). The only interesting bit is the subinterval in which \(s\) lies, so we don’t have to make an infinite number of tosses. Let’s say that the first toss gave us value 1. We know that \(s=0.1\ldots\) (in binary) and it lies somewhere in \([1/3, 2/3]\). If the next toss gives us 0, then \(s=0.10\ldots\) and \(s\in[1/3, 0.59]\). If the next toss gives us 0 again, then \(s=0.100\ldots\) and we know that \(s\in[1/3, 0.46)\) which is a subinterval of \([1/3, 2/3]\). The outcome is the second value.

This scheme can be generalized in an obvious way. If we want to get an arbitrary distribution we just have to split \([0,1]\) into a given number of subintervals of lengths that match the probabilities of the values.

Infinite sets

In this post, I’ll cover more or less the basic topics which are usually covered in any set theory course. I expect that the theorems presented in this post will be used extensively in my next articles and I will be referencing this post a lot.

I am skipping many formal details getting straight to the main idea of proofs, but all of the proofs can be made strict without much effort. I also present simpler results without proofs in the form of exercises. I am too lazy to prove each bit of trivialities, sorry.

We will start from a seemingly absurd statement.

Theorem. A unit line segment \([0, 1]\) contains exactly the same number of points as a filled unit square \([0, 1]^2\).
Proof. Consider a decimal fraction representation of points of the unit segment and let function \(f\) map these points to a unit square by taking all odd positioned digits of decimal representation to the first coordinate of the square and all even positioned digits to the second coordinate of the square. This cryptic rule can be made clear with the help of an example. Point \(0.1234567\) has odd-positioned digits 1, 3, 5, 7 and even-positioned digits 2, 4, 6 in its decimal representation, so \(f\) will take this point into a point with coordinates \((0.1357, 0.246)\). It is easy to see that this mapping is one to one. QED.

Exercise. As promised, I provide quite informal proofs and sometimes admit holes in them. The proof above has one such hole. Can you see it? Can you resolve it? (Hint: any number has two decimal representations, for example number 0.2 can be also represented as an infinite fraction 0.19999?).

This proof can be easily extended to the case of a unit cube, or even a unit n-dimensional cube. Instead of mapping a point to two coordinates, map it to \(n\) coordinates and use each n-th digit of decimal representation for this purpose.

This result is quite counter-intuitive and may seem even wrong. After all, a line segment has area 0, but a unit square has area 1. A unit square has volume 0, but a unit cube has volume 1. A unit square consists of an infinite number of unit line segments, while a unit cube consists of an infinite number of squares. How can it be that these three figures have the same number of points?

The catch here is in the terminology. When we say “line segment”, “square” or “cube” we immediately visualize geometric objects. The same terminology as in geometry is used in set theory, but set theory lacks all of the geometric sense. Things like “length”, “angle” or “area” are just not defined in set theory and we cannot make any references to them.  Without these notions, set-theoretic figures are not exactly the same as geometric figures, even though we can describe them using coordinates.  In particular, you are not supposed to draw set-theoretic figures on a sheet of paper, since any picture immediately suggests geometric information which doesn’t exist. As soon as you drop all of your geometric intuition, the only remaining way of dealing with these figures is to analyze functions that map points between them.

The same consideration applies to physical intuition. Mathematical objects don’t usually have mass, atoms inside, density etc, so when we stretch objects, they don’t become less dense. As an example, take function \(f:x\mapsto 2x\) which maps interval \([0,1]\) to \([0,2]\). Clearly, \([0,1]\) is a subset of \([0,2]\), but at the same time \(f\) is one-to-one, so each point in \([0,1]\) can be mapped uniquely with a point in \([0,2]\) and vice versa. Thus from one point of view \(f\) stretches the interval two times, since any subinterval is mapped to a subinterval of twice the length (e.g. \([0.1,0.2]\) is mapped to \([0.2, 0.4]\)). At the same time its image is not less “dense” than its preimage, so in this sense the function just doubles the interval. Once again, all of this seems paradoxical only until you appeal to physical intuition. As soon as you accept the idea that sets have nothing to do with geometry or physics, you should be fine with these abstractions.

The preceding paragraph can be summarized as a rule: stretching of an object doesn’t change the number of points in it.

To make this reasoning more clear, consider a set of integers. Are there more integer numbers than even numbers? From one point of view yes, since any even number is an integer, but not any integer is even. At the same time function \(x\mapsto 2x\) provides a way to match each integer with exactly one even number and vice versa. This makes us conclude that the number of integers and even integers is the same.

Which response is correct? In fact, both of them could be correct, depending on how you define “more”. As soon as we leave the world of finite sets, we cannot really say that one set has more elements than another in the same sense as we compare sets of 2 and 3 elements. Looking into subsets is not very productive though. We know that \([0,1]\subset[0,2]\), but at the same time \([100, 101]\not\subset[0,2]\), even though it is the same set moved on a real line. We clearly don’t want moving sets around affecting our reasoning about sizes of sets. Thus, it appears to be more natural when we use one-to-one mappings to compare cardinality of sets and so this definition is usually used.

Definition. Function \(f\) is called injection, if it maps different elements into different elements. Formally, for injective function, if \(f(x)=f(y)\) then \(x=y\).

Definition. We say that set \(A\) is not smaller than set \(B\) (or has not smaller cardinality, or has not less elements) and denote it as \(|A|\le|B|\) if there is an injection from \(A\) to \(B\). We say that set \(A\) has the same number of elements as \(B\) (or the same cardinality, or the same number of elements) and denote it as \(|A|=|B|\) if there is a one-to-one mapping between \(A\) and \(B\).

Definition. We say that set \(A\) has the cardinality of the continuum if there exists a one-to-one mapping between \(A\) and a unit segment. We denote this as \(|A|=\mathfrak{c}\).

Schröder-Bernstein Theorem. If \(|A|\le|B|\) and \(|B|\le|A|\), then \(|A|=|B|\).

The statement of the theorem seems quite intuitive, but given all of the subtlety of infinite sets, this requires a formal proof. I am skipping it, but if you are interested you can check one of the approaches on Wikipedia.

Exercise. Prove that the whole real line \(\mathbb{R}\) has the cardinality of the continuum.

Exercise. Prove that an arbitrarily dimensional space \(\mathbb{R}^n\) has the cardinality of the continuum.

Exercise. Given \(|A|\le|B|\le|C|\) and \(|A|=|C|\), prove that \(|A|=|B|\).

Consider an arbitrary figure in n-dimensional space. If this figure has a subset of continuum cardinality, we end up in the situation of the exercise above. Thus we conclude that any figure in any finite-dimensional space has the maximum cardinality of the continuum.

This result raises an interesting question: is it possible at all to have the cardinality bigger than the continuum? Before answering this question let’s consider a couple of more examples.

Definition. Set \(A\) is called countable if it contains the same number of elements as the set of natural numbers \(\mathbb{N}= \{0, 1, 2, 3, \ldots\}\). This is the same as to say that all elements of \(A\) can be arranged into an infinite sequence. We denote it as \(|A|=\aleph_0\).

The following exercises are elementary and compulsory to be solved before you continue.

Exercise. Prove that union of a countable number of countable sets is countable.

Exercise. Prove that Cartesian product of countable sets is countable.

Exercise. Prove that sets of integers and rational numbers are countable.

Exercise. Prove that a set of finite words built from a finite alphabet is countable.

Exercise. Prove that a set of polynomials with integer coefficients is countable.

All of the statements in the exercises above are quite simple, but at the same time important. I think that you will be able to solve these quite easily, otherwise just try to search for the solution on the Internet.

From the exercises we see that products and unions of countable sets are always countable. All of the fractions make up a countable set as well. This should make us suspicious if non-countable sets exist at all. In particular, is continuum countable? The answer to this question is negative. The traditional way of proving it is to employ “Cantor’s diagonal argument”, one form of which I present here.

Cantor’s Theorem. For an arbitrary set \(A\), its powerset (i.e. a set of all subsets, we denote it by \(2^A\)) has a strictly bigger cardinality than \(A\).

Proof. We will prove this theorem by contradiction. Let’s assume that there is a one-to-one mapping \(f: A \to 2^A\). Let $$X=\{x\in A | x\not\in f(x)\}$$ Since \(f\) is one-to-one and \(X\) is a subset of \(A\), we know that there is \(a\in A\) such that \(f(a) = X\). If we assume that \(a\in X\) than by definition of \(X\) we get \(a\not\in f(a)=X\) which is a contradiction. The assumption that \(a\not\in X\) leads to the same contradiction. Thus, both possibilities are impossible and our initial assumption that one-to-one \(f\) exists is false. We conclude that there is no one-to-one mapping between \(A\) and \(2^A\). QED.

If \(|A|=a\) then for cardinality of all subsets of \(A\) we will write \(|2^A| = 2^a\).

Theorem. \(\mathfrak{c} = 2^{\aleph_0}\).

Proof. Any number within a unit segment can be represented as a fraction between 0 and 1 in a binary number system (e.g. any number can be written as something like 0.11001001110…) If we match each digit in this binary representation to a number in \(\mathbb{N}\), then we see that we can map each number within a unit segment to a subset of \(\mathbb{N}\). By Cantor’s theorem we see that the cardinality of a unit segment has to be bigger than the cardinality of \(\mathbb{N}\). QED.

From here we can immediately make some practical statements. The first two of them demonstrate why it makes sense to compare sets using one-to-one mappings.

Statement 1. There are real numbers which are not solutions to any of the polynomial equations with integer coefficients (numbers which appear to be solutions are called algebraic and numbers which cannot be solutions are called transcendental; this statement says that transcendental numbers exist).

This statement follows immediately from the fact that the set of all polynomials is countable, while the set of real numbers is uncountable.

Statement 2. There are numbers such that we cannot write a computer program which prints all of the digits of this number sequentially (such numbers are often called non-computable, although terminology may vary). Note that I am not saying anything about the time which the program takes for computation. This is the statement only about the theoretical possibility of such a program, even in the case when printing the digits will take an infinite amount of time.

This is actually the same statement as statement 1. Each computer program is just a finite string made up of symbols of a finite alphabet and as you have proved above, sets of such strings are always countable. The proof of this fact mirrors the proof for the number of polynomials. Please note that this result is applied irrespectively to the programming language, formal logic or the computer architecture we use. Until programs are strings, non-computable numbers will exist.

Statement 3. Set of all sets cannot exist.

If a set of all sets could exist, it would be “the biggest” set of all, since it has to contain all the elements from all of the sets. Since we know that for an arbitrary set \(A\) its powerset \(2^A\) is “bigger” than \(A\), we conclude that there cannot be “the biggest” set.

The last three statements are simple enough to be proven as an exercise.

Statement 4. Any set of intervals on a real line is at most countable.

Statement 5. Any real function can have at most a countable number of extremums.

Statement 6. Any real increasing function can have at most a countable number of discontinuities.

In fact, there is nothing special about increasing functions. In one of the following blog posts we will generalize the last statement to all functions \(\mathbb{R}\to\mathbb{R}^n\) which have only discontinuities of the first kind. But this is a topic for a separate blog post.

Impossible puzzle

The following story took place in the Soviet Union, where children started their studies at school at the age of 7. It is important to know this for the solution of the problem.

Two mathematicians who hadn’t seen each other for years met in the park and had the following conversation:

“Hi! How are you? Do you already have kids?”

“Hi! I’m fine, thanks! Yes, I have two kids, both of them are still preschoolers.”

“How old are they?”

“The product of their ages is the number of pigeons around that bench.”

“Hmm. I don’t have enough information to learn their ages.”

“The older takes after his mother.”

“Oh, now I see!”

Question: What are the ages of the children?

I have to admit that when I was given this problem in my teenage years, I wasn’t able to solve it. When I gave up, I was given a hint: “The order of the phrases in the dialogue is important”. This allowed me to solve the problem within minutes. I am not publishing the solution, but instead, I’ll present a set of similar problems below the statements of which can also be used as a hint. Feel free to share and discuss the solution in the comment section though.

Many years after I was given this problem, I learned that it is a simplification of a much harder mathematical puzzle. Here it is.

Alice thought of two distinct numbers, both bigger than 1. She told the sum of the numbers to Sam (this sum appeared to be smaller than 100) and their product to Peter. Then the following conversation between Sam and Peter occurred:

“I know that you don’t know the numbers which Alice thought of,” told Sam to Peter.

“Now I know them,” Peter answered.

“Now I know them too,” said Sam.

Question: What are Alice’s numbers?

This problem is much harder than the puzzle with children’s ages stated above. I was able to solve it only by writing a program which enumerated all of the possibilities, but I know that many people do it manually using simple arithmetic tricks and tedious bookkeeping. If you cannot solve the problem yourself you can check the solution in Wikipedia. Note that the condition for the sum of Alice’s numbers to be less than 100 is critical. It is clear that a too wide range would make the solution non-unique, but what is interesting is that if the range is too small (if the sum is less than 62) the solution also cannot be found. It is quite difficult and instructive to analyze how the number of solutions depends on the allowed range for the sum.

If this problem still appears too difficult, you can also check “Cheryl’s Birthday” problem from Asian Schools Maths Olympiad. It is just another variation of the same problem of less complexity. (It seems that this is the easiest version of all known to me).

There is one more modification of the same problem which is my favorite.

Alice made up three positive natural numbers. She told the sum of these numbers to Sam and the product of these numbers to Peter. Then the following conversation happened:

“If only I knew that the number which you were given is bigger than the number which I was given, then I’d be able to immediately tell what numbers Alice made up”, said Sam.

“No, the number which Alice told me is smaller than yours, and I know all of the numbers”, Peter responded.

Question: What numbers did Alice make?

This problem is not as immediately straightforward as the problems above, although it is much easier computationally. Once again I am not providing the solution, but you can discuss it in the comments section.

For the reference, the answer to the last problem is numbers 1, 1 and 4. It is quite easy to find these numbers, but you should also be able to demonstrate clearly that these numbers are the only solution which is possible. This problem was given in one of the Russian Math Competitions (unfortunately, I’ve lost the reference) and if the correct answer was given, but the uniqueness of the solution wasn’t shown, the solution wouldn’t count.

Twitter and Facebook accounts

People were asking me how they can follow this blog. My preferred way is to use RSS, but it may be inconvenient for non-techies.

In order to make following the blog easier, I created a Facebook group and a Twitter account where I am going to publish any updates. I am not sure that this is the best way to keep people updated, so if you have better ideas, please let me know.

God’s existence

You may have seen headlines like “Mathematicians proved the existence of God”. I’ve seen it a couple of times. In this post, we will see two of such proofs: a very famous one by Godel (the first version of which appears in Godel’s papers from 1941) and another a little bit less famous by Hartshorne (published 1962).

We’ll use classical higher-order modal logic and first of all I have to explain what it means.

“Classical” in context of logic means that any statement is either true or false. If \(S\) is an arbitrary statement, it means that \(S\lor\neg S\) is true and this is an axiom. Here \(\lor\) means “or” and \(\neg\) means “not”. We refer to this as to the law of excluded middle.

We will also make use of symbol \(\land\) for “and”, \(\rightarrow\) for implication and \(\leftrightarrow\) for logical equivalence.

Two things about implication may be non-intuitive. First of all, according to laws of implication absolutely any statement follows from a false statement. For example, the following seemingly absurd statement is considered to be true: “\(1+2=14\) follows from the fact that \(2+3=1\)“. It may be uncomfortable to accept this kind of reasoning, but this is really a true implication. I’ll explain this in more details in one of the following blog posts, but for now, I can just say that the fact that we can derive arbitrary absurd from wrong statements doesn’t really affect anything. Since the assumption of implication is never true, such rule is never applied. It is still important to know this for understanding the following. We’ll refer to this as to paradoxical implication.

Let’s say that \(A\) and \(B\) are some statements. Statement \(A\rightarrow B\) is equivalent to \(\neg B\rightarrow \neg A\). You can either learn it in any basic logic text or just think about it for a bit and convince yourself that this equivalence is true. We’ll refer to this as to implication contrapositive rule.

If statement \(S\) depends on some object \(x\) and we know that such object always exists, we write it as \(\exists x. S(x)\). If statement \(S\) is true for arbitrary object, we write it as \(\forall x. S(x)\). The second classical assumption is that \(\neg\exists x.S(x)\leftrightarrow\forall x. \neg S(x)\). It is just an axiom and you can convince yourself that this makes sense by just exploring couple of examples.

In statement \(S(x)\), \(S\) is called predicate, while \(x\) is object (in a language of logic it is called “term”, but we will stick to more non-logician friendly terminology). First-order logic (the one which is studied in universities) allows us to apply quantifiers \(\exists\) and \(\forall\) for objects, but not for predicates themselves. We are also not allowed to apply predicates to other predicates. Higher-order logic differs from first-order logic in that it relaxes this requirement, so we may write statements like \(\forall S .\exists x. S(x)\).

Now the interesting part. Modal logic. As I said above, we assume that any statement is either true or false. Modal logic goes further and says that some of the statements can be necessarily true or false, and some of the statements are possibly true or false. We denote it with symbols \(\diamond\) and \(\Xi\) respectively:

  1. Statement is “just” true: \(S\).
  2. Statement is possibly true: \(\diamond S\).
  3. Statement is necessarily true: \(\Xi S\).

Unfortunately, I am not using a standard notation here. “Necessary truth” is traditionally denoted by a shallow square, but this symbol is unavailable in the tool which I use for writing formulas. I hope I’ll be able to fix it in the future and get back to this article to change it.

One way to think about these three modes of truthfulness is to assume that we have many possible worlds. They can be imaginary worlds or some physical alternative realities. We live in one of these worlds. Given this assumption we can interpret these formulas this way:

  1. \(S\): the statement is true in our world, but not necessarily in others.
  2. \(\diamond S\): the statement may be true in some of the worlds, but not necessarily in our world. What is important is that we can imagine the world where \(S\) is true.
  3. \(\Xi S\): the statement is true in all possible worlds we can imagine.

Modal logic has a number of axioms:

M1. \(\neg \Xi S \leftrightarrow \diamond \neg S\). If it is not true that something is true in each world, then there may exist a world where it is false. We can treat this first axiom as a definition of \(\diamond\) through \(\Xi\).

M2. \(\Xi S \rightarrow S\). If something is necessarily true, then it is true in our world as well.

M3. \((\Xi (S \rightarrow T)) \rightarrow ((\Xi S) \rightarrow (\Xi T))\). If in all worlds \(T\) follows from \(S\), then if in all worlds \(S\) is true we also know that in all worlds \(T\) is true as well.

M4. If \(S\) can be proven mathematically (which includes the case when \(S\) is an axiom), then \(\Xi S\).

M5. \(\diamond \Xi S \rightarrow \Xi S\). If \(\diamond\Xi S\) is true, then it means that there exists some world where it is known that \(\Xi S\). If we could transfer ourselves to that world, we would know that \(S\) is necessarily true. So we conclude that \(\Xi S\).

Application of implication contrapositive rule to M5 gives us formula \(\diamond \neg S \rightarrow \Xi \diamond \neg S\). Denoting \(\neg S\) by \(T\) we get an equivalent statement of the same axiom:

M5′. \(\diamond T \rightarrow \Xi \diamond T\)

Modal logic often uses other axioms as well, but we wouldn’t need them.

Now we are ready to prove the existence of God following Hartshorne. I’ll loosely follow the paper by Small here.

Let’s denote statement “God exists” by \(g\). Hartshorne assumed two axioms:

HA1. \(\diamond g\). God may exist (he assumed that even atheists can imagine the existence of God in some alternative reality).

HA2. \(\diamond\neg g \rightarrow \neg g\). It is quite a pessimistic assumption: from the fact that it is possible that God doesn’t exist in some other reality, we immediately conclude that it doesn’t exist in our own reality. (In fact, I am tricking you right now and abusing the initial axiom, but I’ll explain it a little bit later).

Now we are ready for the proof. Below if I don’t mention explicitly where we derive a formula from, it implies that we use the previous formula.

H1. First let’s substitute \(\neg g\) to M5′ axiom: \(\diamond\neg g \rightarrow \Xi\diamond \neg g\). Note that we are not making assumptions about God here, only using tautologies of modal logic.

H2. Substitute \(\Xi g\) to the law of excluded middle and use M1: \((\Xi g)\lor(\diamond \neg g)\).

H3. Applying implication H1 to H2 we are getting \((\Xi g) \lor (\Xi\diamond\neg g)\).

H4. Since we assume HA2 as an axiom, we can apply rule M4 to get \(\Xi(\diamond\neg g \rightarrow \neg g)\)

H5. By M3: \((\Xi\diamond\neg g) \rightarrow (\Xi\neg g)\)

H6. H5 and H3 together gives us \((\Xi g) \lor (\Xi\neg g)\). This eliminates the possibility of accidental God existence. Either God exists in all possible worlds or it cannot exist in any of them.

H7. Axiom M1 is equivalent to \(\neg\neg\diamond g\) (two negations cancel each other), to which we can apply M1 to get \(\neg\Xi\neg g\).

H8. From H7 we know that term \(\Xi\neg g\) in H6 cannot be true. So the only possibility for H6 to be true (and we know that it is true because we proved it) is that \(\Xi g\) is true. This shows that God necessarily exists which completes the proof.

Does this really prove that God exists? It depends if you agree with Hartshorne’s axioms and axioms of modal logic. In fact, this proof doesn’t even say what “God” is. It can be applied for any statement which matches the same set of axioms. For example, substitute “God doesn’t exist” for \(g\) and the same reasoning would apply for the proof that God doesn’t exist.

Also, it is interesting to note that axiom HA2 is equivalent to statement \(g \rightarrow \Xi g\) (using implication contrapositive and M1) which reads as “if God exists in our world, it must exist in any possible world”. This statement looks much stronger than the original axiom and seems to be less probable to be true, even though these two formulations are totally equivalent! Actually, Hartshorne used exactly this “less pessimistic” axiom, but I chose to change it to make it look less shaky at least initially.

Can this proof be considered mathematical? Once again it depends on what you call “mathematics”. It definitely doesn’t use any advanced mathematical concepts and is quite trivial. The only bit of mathematics here is the usage of formulas and a strict agreement on axioms and what logical derivations we consider as acceptable. This is what mathematics is actually is: it is just an agreement on rules which we use and a convenient language which helps our reasoning. Exactly the same argument could be repeated without any formulas, but then it would be much more difficult to follow. In fact, originally a kind of this proof was presented informally by Anselm in XI century and only in 1962 it was made formal by Hartshorne (although honestly, I find the relation to be quite loose).

Does this proof have any value? Of course! It is at least a nice exercise in modal logic and an interesting model you can think of. As was discussed in the post on models, even if some model is not correct, it is still able to give food for thoughts. In this particular case, a formal model made obvious how shaky Anselm’s argument was (which was clear anyway, but this model makes the proof and the vision of its weaknesses accessible to non-philosophers).

Now after a short warm-up with modal logic we are ready to discuss Godel’s proof. In comparison to Hartshorne’s proof, this proof also tries to explain what God actually is. Informally speaking, Godel assumes that each imaginable object has a set of properties and is uniquely described by that set (here I don’t assume any definition of “set” in a set-theoretic sense). This reflects what physicists do when they introduce some object. For example, in order to explain what “electron” is you have to specify its properties: its charge, mass, diameter, etc.

Each property is binary: an object can either have a property or not, it cannot have some degree of a property. If object \(x\) has property \(\phi\) we denote it as \(\phi(x)\). This is not limiting at all since properties can be arbitrarily precise (like “an object is 1.23 nanometers in diameter”), but just a consequence of the syntax we use. Also, some of the properties are “positive” and some are not. If \(\phi\) is positive we denote it as \(P(\phi)\). What “positive” means is not specified, but God is such an object that has all positive properties.

It is possible that for some property \(\phi\) there is no object which has this property (consider property “this object does not exist”). If this is not the case and there are some object \(x\) such that \(\phi(x)\) holds, we call \(\phi\) instantiable and \(x\) an instance for property \(\phi\).

In the following, I’ll follow the notation and statements order of Wikipedia.

Axiom 1. \((P(\phi)\land \Xi \forall x . \phi(x)\rightarrow \psi(x)) \rightarrow P(\psi)\)

Formula \(\forall x . \phi(x)\rightarrow \psi(x)\) means that property \(\psi\) follows from property \(\phi\): any object which has property \(\phi\) will also have property \(\psi\). We can also call \(\psi\) a subproperty of \(\phi\). As an example consider properties “Bigger than 1 meter” and “Bigger than 1 millimeter”. The second clearly follows from the first one. What Axiom 1 says is that from positive property only positive properties can follow. It cannot be the case that some positive property implies a negative one.

Axiom 2. \(P(\neg\phi)\leftrightarrow\neg P(\phi)\)

For any property \(\phi\) we have an opposite property denoted by \(\neg\phi\) which reads just as absence of \(\phi\). Axiom 2 says that if the presence of some property is a positive thing than absence of it is not and vise versa. It means that we cannot have neutral properties (or we just don’t consider such properties).

Theorem 1. For any positive property it may exist an object which has this property (or in other words, any positive property is instantiable): \(P(\phi)\rightarrow \diamond \exists x . \phi(x)\)

Proof: Let’s assume that property \(\phi\) is non-instantiable. Then statement \(\phi(x)\) is always false and any statement follows from it by paradoxical implication. So we have statement \(\forall x. \phi(x)\rightarrow \psi(x)\) for arbitrary property \(\psi\). This means that all of the properties are subproperties of \(\phi\) together with their opposites. If \(\phi\) was positive that would contradict Axiom 2. This concludes that no positive property can be non-instantiable. QED

Definition 1. \(G(x)\leftrightarrow \forall\phi . (P(\phi)\rightarrow \phi(x))\)

\(G\) is a property which we’ll call “God-like”. This definition says that a property of being God-like implies having all positive properties (and no negative properties, otherwise it would contradict Axiom 2).

Axiom 3. \(P(G)\) (being God-like is a positive property).

Theorem 2. \(\diamond \exists x . G(x)\) (it is possible that God-like object exists at least in some world).

Proof: This theorem is trivial since God-likeness is a positive property and we know that all positive properties are instantiable by Theorem 1. QED

Definition 2. \(\phi \;\mathrm{ess}\; x \leftrightarrow (\phi(x)\land (\forall \psi . \psi(x)\rightarrow \Xi \forall y . \phi(y)\rightarrow\psi(y)))\)

This defines the essence of an object. Informally speaking, the essence of an object is a property which determines an object uniquely up to a world. Let’s say that \(x\) and \(y\) are the same objects but in the different worlds. We cannot write \(x=y\) because these two objects are still not exactly the same—they live in different worlds. But if you had a way to bring this objects in the same world, they would become indistinguishable and effectively the same physical entity. In order to capture this idea, Godel introduced “essences”—properties which describe objects uniquely within the world.

Given this informal reasoning, you now can understand what the formal definition means. It just says that if some property \(\phi\) is the essence of an object \(x\), then any other object \(y\) which has the same essence \(\phi\), would have exactly the same set of properties in any world imaginable. As I said before, the idea here is that properties describe objects uniquely, but belonging to a world is not a property which can be used to identify objects across the worlds. Definition of essence does the trick.

Axiom 4. \(P(\phi)\rightarrow \Xi P(\phi)\) (if some property is positive, it is positive in all imaginable worlds).

Theorem 3. \(G(x) \rightarrow G\;\mathrm{ess}\; x\) (if \(x\) is God-like, then \(G\) is its essence)

Proof: We know by definition that being God-like determines all of the properties of an object uniquely (it is all positive properties). Since we know that positivity of a property is the same in all of the worlds (by Axiom 4), it means that all of the God-like objects have exactly the same properties in all of the worlds. This is what essence is all about (although it might require some reflection). QED

Definition 3. \(E(x) \leftrightarrow \forall \phi .( \phi \;\mathrm{ess}\; x \rightarrow \Xi \exists y . \phi(y))\)

Property \(E\) is called necessary existence. If object \(x\) necessarily exists, it means that in any imaginable world exists some object with the same essence. It is instructive to compare this with \(\Xi\) operator. While \(\Xi\) tells that some logical statement is true in every possible world, \(E\) makes a statement that some object exists in every possible world with exactly the essence?of a given object.

Axiom 5. \(P(E)\)

This is another controversial axiom which says that necessary existence is a positive property. One way to think about it is that positivity of a property is not about being “better”, but about being “greater” in a sense of being bigger, smarter, more powerful, containing more energy, having bigger speed, etc. These examples are arbitrary, nothing actually says that being “greater” cannot mean “smaller in size”, but if we continue this intuitive idea, then we can think that the more worlds the object can exist in, the greater it is. This is what this axiom is about.

Theorem 4. God exists (more precisely, in any world exists a God-like object): \(\Xi\exists x. G(x)\)

Proof: Since a God-like object has all positive properties, it also has the property of necessary existence, which gives us formula $$\forall x.G(x)\rightarrow E(x)$$

If we combine this with theorem 2 we get $$\diamond \exists x. G(x)\land E(x)$$

By definition of essence $$\diamond \exists x. G(x)\land \Xi\exists y.G(y)$$

If some object has some two properties, it definitely has one of them, so we get $$\diamond \exists x. \Xi\exists y.G(y)$$

or, since \(x\) is not used in this formula anymore, we drop it and get $$\diamond\Xi\exists y.G(y)$$

Applying axiom of modal logic M5 we get the statement of the theorem. QED

Did this prove that God really exists? Not really. At least not in a popular religious way. Next time when you see in a newspaper that “scientists proved that God exists”, you know what scientists really did and how it is related to the God which you or your neighbor worship. Most likely, not related at all. And at best it is not more than a language play by formal rules.

Once again, even though this proof doesn’t tell us much about the reality in which we live and that each of the axioms can be criticized, I find this proof is still quite elegant. The play with essences is smart and may be even used in absolutely unrelated modal logic derivations.

In order to get a better feeling of this proof and modal logic I suggest you to solve the following exercise:

Exercise. Let’s assume that the definition of essence is formulated as

$$\phi \;\mathrm{ess}\; x \leftrightarrow (\forall \psi . \psi(x)\rightarrow \Xi \forall y . \phi(y)\rightarrow\psi(y))$$

and all of the axioms are still the same. Prove that in this case, Godel’s axioms are self-contradictory. (Hint: start by proving that a non-instantiable property is the essence for any object and get a contradiction from this).

Gorky’s description of a mathematician

I was reading “My Universities” by Maksim Gorky yesterday, and I found a curious description of a mathematician who was his neighbor in 1884:

[…] Three rooms opened on the corridor, two occupied by prostitutes and the third by a consumptive mathematician, tall and cadaverous, and topped with coarse, red hair. Through holes in the dirty rags that barely covered him, one shuddered to see his corpselike ribs and bluish skin.

It would appear that the mathematician sustained himself chiefly on his own nails, which he chewed down till they bled. Day and night he was at his formulas, mumbling numbers, and coughing with dry rasps. The prostitutes were uneasy about him, considering him insane, but out of pity they put bread, tea and sugar outside his door. As he carried them in he snorted like a tired horse. If, for some reason, these gifts failed to appear he would stand in the doorway croaking indignantly, “Bread! Bread!”

In their dark caverns his eyes gleamed with maniac pride, happy in the assurance of his greatness. He had an occasional visitor, a hump-backed cripple with a gray beard, thick lenses straddling his swollen nose, and a sharp smile on his yellow, eunuch’s face. Behind the shut door the two would sit for hours in an incomprehensible silence. But once, late in the night, I was awakened by mad screams from the mathematician, “A prison! I say, a cage, a mousetrap! Geometry is a prison!”

The humpbacked dwarf’s retort was a shrill giggle and a word I could not make out, repeated several times.

“Damn you! Get out!” the mathematician roared.

His expelled visitor stood in the corridor, sputtering and pulling his cape about him, while the mathematician, gaunt and terrifying, in the doorway, his fingers writhing in his hair, shouted, “Euclid’s a fool! A fool! God’s wiser that the Greek. I’ll show you!” And he banged the door so hard things in his room crashed down.

The mathematician’s aim, I discovered, was to prove God’s existence mathematically. He died with the task unfinished.

I think I’ll end up the same way.

Comparative advantage and usefulness of models

Since I still haven’t set up LaTeX, this time we’ll do some economics. Let’s consider two countries, which we will call England and Portugal (any coincidences are accidental). England produces 1 unit of wine per worker per day, Portugal produces 2 units of clothes per worker per day. We can represent this in a table:

England Portugal
Wine 1
Clothes 2

It makes sense that given this setting England will sell wine to Portugal and Portugal will sell clothes to England. Both countries benefit from this. We may also try to draw some simple conclusions from this. For example, one may suggest that the price of a unit of wine would be approximately twice the price of clothes.

It is easy to see this. Let’s assume that the price of a unit of wine is $50, but the price of a unit of clothes is $20. It means that one worker in Portugal makes $40 a day, but one worker in England makes $50 a day. There is an incentive to move from Portugal to England to make $10 more, so if this move is easy enough, quite soon not many people would produce clothes, so the price for clothes will go up. We can make a similar argument if the price of wine would be less than twice the price of clothes.

The reality is more complex than this model. We have to take into account not what one particular worker can produce, but what a whole factory can produce. We also have to consider the cost of production, taxes, salaries and many other factors. All of these doesn’t change the theoretical output of the model much: substitute “a worker moved’ to “company moved its business” and you will get exactly the same argument about price balance as above. I’ll keep talking about the productivity of separate workers below just to make an explanation a little bit more accessible, but all of what I am going to say can be made more precise.

Let’s make our model more complex. Now both countries produce wine and clothes (here and below we assume the equal quality of products), but the productivity is not equal:

England Portugal
Wine 1 3
Clothes 3 2

I think this is obvious that in this case England would sell wine to Portugal and Portugal would sell wine to England. At this point, prices will be considerably more complicated, but we are not interested in it. The only important point here is that both countries would trade for the mutual advantage.

Let’s consider this situation:

England Portugal
Wine 1 5
Clothes 3 4

Now Portugal is much better at producing both goods. There is no way for England to export products since both wine and clothes are more expensive to produce in England than in Portugal. At the same time, Portugal could sell its products to England, but England wouldn’t be able to afford to pay for these products. If you want to buy something, you first have to sell something, and as we understood, England is not selling anything. It seems that England is in a dreadful position where either all of the business would move to Portugal or the trade between these countries will be banned and England would just have higher prices and lower quality of life.

In 1817 David Ricardo rightfully challenged the argument which I proposed above with the following example (he used England and Portugal as well). Let’s say that one worker in Portugal switches from producing clothes to producing wine. Portugal’s market will have 4 units of clothes less, but 5 units of wine more. Then two workers in England switch from producing wine to producing clothes. After this England has 6 additional units of clothes and 2 units of wine less.

Now, these two countries can actually trade. If Portugal sends 3 units of wine to England, while England sends 5 units of clothes to Portugal, both countries benefit from this trade. See in a table:

England Portugal
2 workers switch: 1 worker switch:
-2 wine, +6clothes +5 wine, -4 clothes
Sell 5 clothes to Portugal: Sell 3 wine to England:
+1 wine, +1 clothes +2 wine, +1 clothes

It is surprising, isn’t it? We’ve made an additional 3 units of wine and 2 units of clothes for these two countries to share by using trade in a situation where any trade seemed impossible.

Here is what’s going on. It is true that Portugal has an advantage in each of the industries, but the relative productivity of producing clothes is still higher in England than in Portugal. England is 3 times more efficient in producing clothes than wine, but Portugal is just 1.25 times more effective in producing wine, so switching from producing wine to cloth in England is more efficient than switching from cloth to wine in Portugal. This difference in productivity actually makes it feasible for these countries to trade for a mutual benefit.

This argument was widely used as a ground for international trade. I hope that I was able to convince you that this is actually beneficial. Now let’s look at the slightly more realistic example:

Somalia Switzerland
Computers 1 5
Potatoes 3 4

Now it should be clear that comparative advantage would make the poor country focus on agriculture, while the developed country will focus on hi-tech. This way the poor country has no chances of getting out of poverty, while the rich country is only getting richer. The trade deal which looked perfect in the case of England and Portugal above became a disaster when applied to poor and rich countries.

What does the Ricardian model teach us? Is it good to trade or is it bad? Does the model of Ricardo even work? In most situations it doesn’t give any concrete answers (although it still can be used by politicians, those folks can use any kind of tricks anyway) as well as it doesn’t model well real-life situations. It has too many implicit assumptions which make it difficult to apply. First of all, the model considers only two countries and two products. It doesn’t take into account the demand for these products, cost of transportation, competition on the local market and dozens of other factors. There were many attempts to make this model more realistic, but I am not sure if any of these attempts could be called successful.

Should we throw this model away? Not exactly. Even though it doesn’t work quite well, this model is non-obvious and no one before Ricardo was able to think about the trade from this point of view. Even though this model doesn’t give us many answers, it motivates us to ask many interesting questions, which weren’t asked before this model was proposed. This model fueled follow-up research for centuries. It still can be applied in some very limited context, since this model is still logically sound. And some people think that the model still can be refined to the point when it can be applied directly.

Here’s an important lesson: the good model is not necessarily the one which reflects reality well. Other not so easily measured qualities of a model such as the ability to raise new questions may be even more important than the correctness of the model itself.

This applies not only to mathematical models. Newtonian mechanics is incorrect and works only under very specific conditions of ridiculously low velocities of Earth life and only within measurement error. This doesn’t make Newtonian mechanics bad—it has actually proved to be quite useful for almost all of the engineering so far.

Another example is Godel’s proof of god’s existence. The proof was based on so shaky foundations that Godel himself didn’t even try to publish it during his lifetime. And even though the foundation of this work appeared to be incorrect, this proof inspired serious development in mathematical logic and even affected research of Artificial Intelligence.

The most recent example of this kind which actually motivated this post is Theodore Hill’s “Evolutionary Theory for Variability Hypothesis“. In short, the model explains mathematically that there are more idiots as well as geniuses among males than among females. The work is based on extremely shaky foundations and the model is incorrect for sure at least in case of the human population, but it doesn’t make the model bad! Actually, quite opposite: the model is still logically sound, the shaky assumptions can be potentially refined in the future and even if it appears that the model cannot be fixed at all, it can potentially raise questions which weren’t asked before.

Unfortunately, in this particular case, the refinements wouldn’t be made and questions wouldn’t be asked because the paper was effectively banned. What is so wrong with the model that it has to be banned? All the arguments are purely political: this result can be used by right-wing propaganda, the result can discourage women from going into tech, after all, it doesn’t match with the idea that men and women are equal. No purely scientific arguments against this paper were presented, but the political ones were enough for the ban. Unfortunately, this is not the first time when scientific results got banned based on political grounds. This is harsh to say, but I cannot resist comparing this situation to the suppressed research of genetics, informatics and other fields in the Soviet Union and with Aryan Physics in Nazi Germany. I would really like to provide other examples instead of these two, but it seems that in the modern world other examples of banning science just doesn’t exist. The case of Hill’s model is just the third such case in modern history. I hope this situation wouldn’t go so far, but the direction which the western academy has taken is worrying.