Unexpected expected value

Let’s say that random variables \(x_i\) are distributed uniformly on segment \([0;1]\). How many such variables we have to add to get a sum bigger than 1? Formally, if \(n\) is a minimum value such that \(\sum_{i=1}^{n}x_i > 1\), what is the expected value of \(n\)? Surprisingly, it is \(e\).

Let \(m\) be a natural number. We denote the probability that we need more than \(m\) variables so that their sum exceeds 1 as \(p_n\). Geometrically, this probability is just the volume of an \(n\)-dimensional simplex \(X = \{x | \sum_{i=1}^n x_i < 1\}\). In particular, \(p_0 = 1\). This volume can be calculated directly, but there is an elegant probabilistic proof which I’ll demonstrate.

Consider the random variables

$$\begin{eqnarray}y_1 &=& x_1\\ y_2 &=& x_1 + x_2\\ y_3 &=& x_1 + x_2 + x_3\\ \ldots\\ y_m &=& x_1 + x_2 + \ldots + x_m \end{eqnarray}$$

It is obvious that this sequence is increasing and that \(x\in X\) iff \(y \in Y = \{0 \le y_1 \le y_2 \le \ldots \le y_m \le 1\}\). The linear operator which takes \(\{x_i\}\) to \(\{y_i\}\) has determinant 1, so the volumes of \(X\) and \(Y\) are exactly the same.

The volume of \(Y\) is easily calculated. It is the probability that \(m\) random values would appear in a sorted order, which is \(\frac{1}{m!}\). We conclude that \(p_m = \frac{1}{m!}\).

The probability that the sum of the first \(m\) variables exceeds 1 is \(q_m = 1 – p_m\), so the probability that we need exactly \(m\) variables to exceed 1 is given by \(q_m – q_{m-1} = p_{m-1} – p_m\). The expected number of variables which we have to add is then

$$E\left[n\right] = \sum_{m=1}^\infty m (p_{m-1} – p_m)$$

We can get an easier equivalent expression if we consider how we can rearrange a partial sum:

$$\begin{eqnarray} \sum_{m=1}^k m(p_{m-1} – p_m) &=& p_0 – p_1 + 2p_1 – 2p_2 + 3p_2 – 3p_3 + \ldots + kp_{k-1} – kp_k\\ &=& p_0 + p_1 + p_2 + \ldots + p_{k-1} – kp_k\\ &=& \left(\sum_{m=0}^{k-1} \frac{1}{m!}\right) – kp_k \end{eqnarray}$$

Since \(kp_k = \frac{k}{k!} \to 0 \), we conclude that

$$E\left[n\right] = \sum_{m=0}^\infty \frac{1}{m!} = e$$

Arbitrary distribution using biased coin

Problem. Let’s assume that you have a biased coin, but you don’t know the probabilities of its sides. Using the coin, how would you choose one of the numbers 1, 2 or 3 with equal probability?

Stop reading here and try solving the problem yourself before moving on.

Solution. The problem becomes easier if we split it into two parts. Firstly we will learn how to emulate a fair coin using a biased one. After this we describe a procedure which allows us to get numbers from an arbitrary probabilistic distribution.

Let’s denote the probability of tails as \(p\). Then if we toss the coin twice, we can get one of the four outcomes:

  1. Tails both times, with probability \(p^2\),
  2. Heads both times, with probability \((1-p)^2\),
  3. Tails first time, heads second time, probability \(p(1-p)\).
  4. Heads first time, tails second time, probability \(p(1-p)\).

As you see, outcomes 3 and 4 have equal chances to appear. Thus, in order to emulate an experiment with probability 50/50 (fair coin), we can toss the coin twice. If both tosses give the same outcome, then discard it and toss again. When we get tails in one toss and heads in the other, we choose the outcome which comes first.

Example. Consider a sequence of tosses T, T, H, H, T, H, where I denoted tails as T and heads as H. This sequence describes an experiment in which we toss the coin twice and both times we get tails. We discard this result and toss twice again. This time we get two heads. Now, we toss twice one more time. First, we get tails and then heads. We write “tails” as our outcome.

Since now we can assume that we have a fair coin. We will be repeating tosses sequentially and interpreting them as digits in a binary representation of a number \(s\) in interval \([0, 1]\):

$$s=\sum\limits_{i=1}^{+\infty} \frac{t_i}{2^i}$$

Here \(t_i\) denotes the outcome of i-th toss and can be either 0 or 1. In order to get one of the three values at random, we split \([0,1]\) into three subintervals \([0, 1/3), [1/3, 2/3], (2/3, 1]\). (Since particular points like 1/3 has zero probability, we don’t care into which interval we include them). The outcome of the experiment is the number of the subinterval in which \(s\) lies. Since we assume that each digit of \(s\) is uniform, \(s\) itself is uniform in \([0,1]\).

In fact we don’t need the exact value of \(s\). The only interesting bit is the subinterval in which \(s\) lies, so we don’t have to make an infinite number of tosses. Let’s say that the first toss gave us value 1. We know that \(s=0.1\ldots\) (in binary) and it lies somewhere in \([1/3, 2/3]\). If the next toss gives us 0, then \(s=0.10\ldots\) and \(s\in[1/3, 0.59]\). If the next toss gives us 0 again, then \(s=0.100\ldots\) and we know that \(s\in[1/3, 0.46)\) which is a subinterval of \([1/3, 2/3]\). The outcome is the second value.

This scheme can be generalized in an obvious way. If we want to get an arbitrary distribution we just have to split \([0,1]\) into a given number of subintervals of lengths that match the probabilities of the values.

Infinite sets

In this post, I’ll cover more or less the basic topics which are usually covered in any set theory course. I expect that the theorems presented in this post will be used extensively in my next articles and I will be referencing this post a lot.

I am skipping many formal details getting straight to the main idea of proofs, but all of the proofs can be made strict without much effort. I also present simpler results without proofs in the form of exercises. I am too lazy to prove each bit of trivialities, sorry.

We will start from a seemingly absurd statement.

Theorem. A unit line segment \([0, 1]\) contains exactly the same number of points as a filled unit square \([0, 1]^2\).
Proof. Consider a decimal fraction representation of points of the unit segment and let function \(f\) map these points to a unit square by taking all odd positioned digits of decimal representation to the first coordinate of the square and all even positioned digits to the second coordinate of the square. This cryptic rule can be made clear with the help of an example. Point \(0.1234567\) has odd-positioned digits 1, 3, 5, 7 and even-positioned digits 2, 4, 6 in its decimal representation, so \(f\) will take this point into a point with coordinates \((0.1357, 0.246)\). It is easy to see that this mapping is one to one. QED.

Exercise. As promised, I provide quite informal proofs and sometimes admit holes in them. The proof above has one such hole. Can you see it? Can you resolve it? (Hint: any number has two decimal representations, for example number 0.2 can be also represented as an infinite fraction 0.19999…).

This proof can be easily extended to the case of a unit cube, or even a unit n-dimensional cube. Instead of mapping a point to two coordinates, map it to \(n\) coordinates and use each n-th digit of decimal representation for this purpose.

This result is quite counter-intuitive and may seem even wrong. After all, a line segment has area 0, but a unit square has area 1. A unit square has volume 0, but a unit cube has volume 1. A unit square consists of an infinite number of unit line segments, while a unit cube consists of an infinite number of squares. How can it be that these three figures have the same number of points?

The catch here is in the terminology. When we say “line segment”, “square” or “cube” we immediately visualize geometric objects. The same terminology as in geometry is used in set theory, but set theory lacks all of the geometric sense. Things like “length”, “angle” or “area” are just not defined in set theory and we cannot make any references to them.  Without these notions, set-theoretic figures are not exactly the same as geometric figures, even though we can describe them using coordinates.  In particular, you are not supposed to draw set-theoretic figures on a sheet of paper, since any picture immediately suggests geometric information which doesn’t exist. As soon as you drop all of your geometric intuition, the only remaining way of dealing with these figures is to analyze functions that map points between them.

The same consideration applies to physical intuition. Mathematical objects don’t usually have mass, atoms inside, density etc, so when we stretch objects, they don’t become less dense. As an example, take function \(f:x\mapsto 2x\) which maps interval \([0,1]\) to \([0,2]\). Clearly, \([0,1]\) is a subset of \([0,2]\), but at the same time \(f\) is one-to-one, so each point in \([0,1]\) can be mapped uniquely with a point in \([0,2]\) and vice versa. Thus from one point of view \(f\) stretches the interval two times, since any subinterval is mapped to a subinterval of twice the length (e.g. \([0.1,0.2]\) is mapped to \([0.2, 0.4]\)). At the same time its image is not less “dense” than its preimage, so in this sense the function just doubles the interval. Once again, all of this seems paradoxical only until you appeal to physical intuition. As soon as you accept the idea that sets have nothing to do with geometry or physics, you should be fine with these abstractions.

The preceding paragraph can be summarized as a rule: stretching of an object doesn’t change the number of points in it.

To make this reasoning more clear, consider a set of integers. Are there more integer numbers than even numbers? From one point of view yes, since any even number is an integer, but not any integer is even. At the same time function \(x\mapsto 2x\) provides a way to match each integer with exactly one even number and vice versa. This makes us conclude that the number of integers and even integers is the same.

Which response is correct? In fact, both of them could be correct, depending on how you define “more”. As soon as we leave the world of finite sets, we cannot really say that one set has more elements than another in the same sense as we compare sets of 2 and 3 elements. Looking into subsets is not very productive though. We know that \([0,1]\subset[0,2]\), but at the same time \([100, 101]\not\subset[0,2]\), even though it is the same set moved on a real line. We clearly don’t want moving sets around affecting our reasoning about sizes of sets. Thus, it appears to be more natural when we use one-to-one mappings to compare cardinality of sets and so this definition is usually used.

Definition. Function \(f\) is called injection, if it maps different elements into different elements. Formally, for injective function, if \(f(x)=f(y)\) then \(x=y\).

Definition. We say that set \(A\) is not smaller than set \(B\) (or has not smaller cardinality, or has not less elements) and denote it as \(|A|\le|B|\) if there is an injection from \(A\) to \(B\). We say that set \(A\) has the same number of elements as \(B\) (or the same cardinality, or the same number of elements) and denote it as \(|A|=|B|\) if there is a one-to-one mapping between \(A\) and \(B\).

Definition. We say that set \(A\) has the cardinality of the continuum if there exists a one-to-one mapping between \(A\) and a unit segment. We denote this as \(|A|=\mathfrak{c}\).

Schröder-Bernstein Theorem. If \(|A|\le|B|\) and \(|B|\le|A|\), then \(|A|=|B|\).

The statement of the theorem seems quite intuitive, but given all of the subtlety of infinite sets, this requires a formal proof. I am skipping it, but if you are interested you can check one of the approaches on Wikipedia.

Exercise. Prove that the whole real line \(\mathbb{R}\) has the cardinality of the continuum.

Exercise. Prove that an arbitrarily dimensional space \(\mathbb{R}^n\) has the cardinality of the continuum.

Exercise. Given \(|A|\le|B|\le|C|\) and \(|A|=|C|\), prove that \(|A|=|B|\).

Consider an arbitrary figure in n-dimensional space. If this figure has a subset of continuum cardinality, we end up in the situation of the exercise above. Thus we conclude that any figure in any finite-dimensional space has the maximum cardinality of the continuum.

This result raises an interesting question: is it possible at all to have the cardinality bigger than the continuum? Before answering this question let’s consider a couple of more examples.

Definition. Set \(A\) is called countable if it contains the same number of elements as the set of natural numbers \(\mathbb{N}= \{0, 1, 2, 3, \ldots\}\). This is the same as to say that all elements of \(A\) can be arranged into an infinite sequence. We denote it as \(|A|=\aleph_0\).

The following exercises are elementary and compulsory to be solved before you continue.

Exercise. Prove that union of a countable number of countable sets is countable.

Exercise. Prove that Cartesian product of countable sets is countable.

Exercise. Prove that sets of integers and rational numbers are countable.

Exercise. Prove that a set of finite words built from a finite alphabet is countable.

Exercise. Prove that a set of polynomials with integer coefficients is countable.

All of the statements in the exercises above are quite simple, but at the same time important. I think that you will be able to solve these quite easily, otherwise just try to search for the solution on the Internet.

From the exercises we see that products and unions of countable sets are always countable. All of the fractions make up a countable set as well. This should make us suspicious if non-countable sets exist at all. In particular, is continuum countable? The answer to this question is negative. The traditional way of proving it is to employ “Cantor’s diagonal argument”, one form of which I present here.

Cantor’s Theorem. For an arbitrary set \(A\), its powerset (i.e. a set of all subsets, we denote it by \(2^A\)) has a strictly bigger cardinality than \(A\).

Proof. We will prove this theorem by contradiction. Let’s assume that there is a one-to-one mapping \(f: A \to 2^A\). Let $$X=\{x\in A | x\not\in f(x)\}$$ Since \(f\) is one-to-one and \(X\) is a subset of \(A\), we know that there is \(a\in A\) such that \(f(a) = X\). If we assume that \(a\in X\) than by definition of \(X\) we get \(a\not\in f(a)=X\) which is a contradiction. The assumption that \(a\not\in X\) leads to the same contradiction. Thus, both possibilities are impossible and our initial assumption that one-to-one \(f\) exists is false. We conclude that there is no one-to-one mapping between \(A\) and \(2^A\). QED.

If \(|A|=a\) then for cardinality of all subsets of \(A\) we will write \(|2^A| = 2^a\).

Theorem. \(\mathfrak{c} = 2^{\aleph_0}\).

Proof. Any number within a unit segment can be represented as a fraction between 0 and 1 in a binary number system (e.g. any number can be written as something like 0.11001001110…). If we match each digit in this binary representation to a number in \(\mathbb{N}\), then we see that we can map each number within a unit segment to a subset of \(\mathbb{N}\). By Cantor’s theorem we see that the cardinality of a unit segment has to be bigger than the cardinality of \(\mathbb{N}\). QED.

From here we can immediately make some practical statements. The first two of them demonstrate why it makes sense to compare sets using one-to-one mappings.

Statement 1. There are real numbers which are not solutions to any of the polynomial equations with integer coefficients (numbers which appear to be solutions are called algebraic and numbers which cannot be solutions are called transcendental; this statement says that transcendental numbers exist).

This statement follows immediately from the fact that the set of all polynomials is countable, while the set of real numbers is uncountable.

Statement 2. There are numbers such that we cannot write a computer program which prints all of the digits of this number sequentially (such numbers are often called non-computable, although terminology may vary). Note that I am not saying anything about the time which the program takes for computation. This is the statement only about the theoretical possibility of such a program, even in the case when printing the digits will take an infinite amount of time.

This is actually the same statement as statement 1. Each computer program is just a finite string made up of symbols of a finite alphabet and as you have proved above, sets of such strings are always countable. The proof of this fact mirrors the proof for the number of polynomials. Please note that this result is applied irrespectively to the programming language, formal logic or the computer architecture we use. Until programs are strings, non-computable numbers will exist.

Statement 3. Set of all sets cannot exist.

If a set of all sets could exist, it would be “the biggest” set of all, since it has to contain all the elements from all of the sets. Since we know that for an arbitrary set \(A\) its powerset \(2^A\) is “bigger” than \(A\), we conclude that there cannot be “the biggest” set.

The last three statements are simple enough to be proven as an exercise.

Statement 4. Any set of intervals on a real line is at most countable.

Statement 5. Any real function can have at most a countable number of extremums.

Statement 6. Any real increasing function can have at most a countable number of discontinuities.

In fact, there is nothing special about increasing functions. In one of the following blog posts we will generalize the last statement to all functions \(\mathbb{R}\to\mathbb{R}^n\) which have only discontinuities of the first kind. But this is a topic for a separate blog post.

Impossible puzzle

The following story took place in the Soviet Union, where children started their studies at school at the age of 7. It is important to know this for the solution of the problem.

Two mathematicians who hadn’t seen each other for years met in the park and had the following conversation:

“Hi! How are you? Do you already have kids?”

“Hi! I’m fine, thanks! Yes, I have two kids, both of them are still preschoolers.”

“How old are they?”

“The product of their ages is the number of pigeons around that bench.”

“Hmm. I don’t have enough information to learn their ages.”

“The older takes after his mother.”

“Oh, now I see!”

Question: What is the age of the children?

I have to admit that when I was given this problem in my teenage years, I wasn’t able to solve it. When I gave up, I was given a hint: “The order of the phrases in the dialogue is important”. This allowed me to solve the problem within minutes. I am not publishing the solution, but instead, I’ll present a set of similar problems below the statements of which can also be used as a hint. Feel free to share and discuss the solution in the comment section though.

Many years after I was given this problem, I learned that it is a simplification of a much harder mathematical puzzle. Here it is.

Alice thought of two distinct numbers, both bigger than 1. She told the sum of the numbers to Sam (this sum appeared to be smaller than 100) and their product to Peter. Then the following conversation between Sam and Peter occurred:

“I know that you don’t know the numbers which Alice thought of,” told Sam to Peter.

“Now I know them,” Peter answered.

“Now I know them too,” said Sam.

Question: What are Alice’s numbers?

This problem is much harder than the puzzle with children’s ages stated above. I was able to solve it only by writing a program which enumerated all of the possibilities, but I know that many people do it manually using simple arithmetic tricks and tedious bookkeeping. If you cannot solve the problem yourself you can check the solution in Wikipedia. Note that the condition for the sum of Alice’s numbers to be less than 100 is critical. It is clear that a too wide range would make the solution non-unique, but what is interesting is that if the range is too small (if the sum is less than 62) the solution also cannot be found. It is quite difficult and instructive to analyze how the number of solutions depends on the allowed range for the sum.

If this problem still appears too difficult, you can also check “Cheryl’s Birthday” problem from Asian Schools Maths Olympiad. It is just another variation of the same problem of less complexity. (It seems that this is the easiest version of all known to me).

There is one more modification of the same problem which is my favorite.

Alice made up three positive natural numbers. She told the sum of these numbers to Sam and the product of these numbers to Peter. Then the following conversation happened:

“If only I knew that the number which you were given is bigger than the number which I was given, then I’d be able to immediately tell what numbers Alice made up”, said Sam.

“No, the number which Alice told me is smaller than yours, and I know all of the numbers”, Peter responded.

Question: What numbers did Alice make?

This problem is not as immediately straightforward as the problems above, although it is much easier computationally. Once again I am not providing the solution, but you can discuss it in the comments section.

For the reference, the answer to the last problem is numbers 1, 1 and 4. It is quite easy to find these numbers, but you should also be able to demonstrate clearly that these numbers are the only solution which is possible. This problem was given in one of the Russian Math Competitions (unfortunately, I’ve lost the reference) and if the correct answer was given, but the uniqueness of the solution wasn’t shown, the solution wouldn’t count.

Twitter and Facebook accounts

People were asking me how they can follow this blog. My preferred way is to use RSS, but it may be inconvenient for non-techies.

In order to make following the blog easier, I created a Facebook group and a Twitter account where I am going to publish any updates. I am not sure that this is the best way to keep people updated, so if you have better ideas, please let me know.

God’s existence

You may have seen headlines like “Mathematicians proved the existence of God”. I’ve seen it a couple of times. In this post, we will see two of such proofs: a very famous one by Godel (the first version of which appears in Godel’s papers from 1941) and another a little bit less famous by Hartshorne (published 1962).

We’ll use classical higher-order modal logic and first of all I have to explain what it means.

“Classical” in context of logic means that any statement is either true or false. If \(S\) is an arbitrary statement, it means that \(S\lor\neg S\) is true and this is an axiom. Here \(\lor\) means “or” and \(\neg\) means “not”. We refer to this as to the law of excluded middle.

We will also make use of symbol \(\land\) for “and”, \(\rightarrow\) for implication and \(\leftrightarrow\) for logical equivalence.

Two things about implication may be non-intuitive. First of all, according to laws of implication absolutely any statement follows from a false statement. For example, the following seemingly absurd statement is considered to be true: “\(1+2=14\) follows from the fact that \(2+3=1\)”. It may be uncomfortable to accept this kind of reasoning, but this is really a true implication. I’ll explain this in more details in one of the following blog posts, but for now, I can just say that the fact that we can derive arbitrary absurd from wrong statements doesn’t really affect anything. Since the assumption of implication is never true, such rule is never applied. It is still important to know this for understanding the following. We’ll refer to this as to paradoxical implication.

Let’s say that \(A\) and \(B\) are some statements. Statement \(A\rightarrow B\) is equivalent to \(\neg B\rightarrow \neg A\). You can either learn it in any basic logic text or just think about it for a bit and convince yourself that this equivalence is true. We’ll refer to this as to implication contrapositive rule.

If statement \(S\) depends on some object \(x\) and we know that such object always exists, we write it as \(\exists x. S(x)\). If statement \(S\) is true for arbitrary object, we write it as \(\forall x. S(x)\). The second classical assumption is that \(\neg\exists x.S(x)\leftrightarrow\forall x. \neg S(x)\). It is just an axiom and you can convince yourself that this makes sense by just exploring couple of examples.

In statement \(S(x)\), \(S\) is called predicate, while \(x\) is object (in a language of logic it is called “term”, but we will stick to more non-logician friendly terminology). First-order logic (the one which is studied in universities) allows us to apply quantifiers \(\exists\) and \(\forall\) for objects, but not for predicates themselves. We are also not allowed to apply predicates to other predicates. Higher-order logic differs from first-order logic in that it relaxes this requirement, so we may write statements like \(\forall S .\exists x. S(x)\).

Now the interesting part. Modal logic. As I said above, we assume that any statement is either true or false. Modal logic goes further and says that some of the statements can be necessarily true or false, and some of the statements are possibly true or false. We denote it with symbols \(\diamond\) and \(\Xi\) respectively:

  1. Statement is “just” true: \(S\).
  2. Statement is possibly true: \(\diamond S\).
  3. Statement is necessarily true: \(\Xi S\).

Unfortunately, I am not using a standard notation here. “Necessary truth” is traditionally denoted by a shallow square, but this symbol is unavailable in the tool which I use for writing formulas. I hope I’ll be able to fix it in the future and get back to this article to change it.

One way to think about these three modes of truthfulness is to assume that we have many possible worlds. They can be imaginary worlds or some physical alternative realities. We live in one of these worlds. Given this assumption we can interpret these formulas this way:

  1. \(S\): the statement is true in our world, but not necessarily in others.
  2. \(\diamond S\): the statement may be true in some of the worlds, but not necessarily in our world. What is important is that we can imagine the world where \(S\) is true.
  3. \(\Xi S\): the statement is true in all possible worlds we can imagine.

Modal logic has a number of axioms:

M1. \(\neg \Xi S \leftrightarrow \diamond \neg S\). If it is not true that something is true in each world, then there may exist a world where it is false. We can treat this first axiom as a definition of \(\diamond\) through \(\Xi\).

M2. \(\Xi S \rightarrow S\). If something is necessarily true, then it is true in our world as well.

M3. \((\Xi (S \rightarrow T)) \rightarrow ((\Xi S) \rightarrow (\Xi T))\). If in all worlds \(T\) follows from \(S\), then if in all worlds \(S\) is true we also know that in all worlds \(T\) is true as well.

M4. If \(S\) can be proven mathematically (which includes the case when \(S\) is an axiom), then \(\Xi S\).

M5. \(\diamond \Xi S \rightarrow \Xi S\). If \(\diamond\Xi S\) is true, then it means that there exists some world where it is known that \(\Xi S\). If we could transfer ourselves to that world, we would know that \(S\) is necessarily true. So we conclude that \(\Xi S\).

Application of implication contrapositive rule to M5 gives us formula \(\diamond \neg S \rightarrow \Xi \diamond \neg S\). Denoting \(\neg S\) by \(T\) we get an equivalent statement of the same axiom:

M5’. \(\diamond T \rightarrow \Xi \diamond T\)

Modal logic often uses other axioms as well, but we wouldn’t need them.

Now we are ready to prove the existence of God following Hartshorne. I’ll loosely follow the paper by Small here.

Let’s denote statement “God exists” by \(g\). Hartshorne assumed two axioms:

HA1. \(\diamond g\). God may exist (he assumed that even atheists can imagine the existence of God in some alternative reality).

HA2. \(\diamond\neg g \rightarrow \neg g\). It is quite a pessimistic assumption: from the fact that it is possible that God doesn’t exist in some other reality, we immediately conclude that it doesn’t exist in our own reality. (In fact, I am tricking you right now and abusing the initial axiom, but I’ll explain it a little bit later).

Now we are ready for the proof. Below if I don’t mention explicitly where we derive a formula from, it implies that we use the previous formula.

H1. First let’s substitute \(\neg g\) to M5’ axiom: \(\diamond\neg g \rightarrow \Xi\diamond \neg g\). Note that we are not making assumptions about God here, only using tautologies of modal logic.

H2. Substitute \(\Xi g\) to the law of excluded middle and use M1: \((\Xi g)\lor(\diamond \neg g)\).

H3. Applying implication H1 to H2 we are getting \((\Xi g) \lor (\Xi\diamond\neg g)\).

H4. Since we assume HA2 as an axiom, we can apply rule M4 to get \(\Xi(\diamond\neg g \rightarrow \neg g)\)

H5. By M3: \((\Xi\diamond\neg g) \rightarrow (\Xi\neg g)\)

H6. H5 and H3 together gives us \((\Xi g) \lor (\Xi\neg g)\). This eliminates the possibility of accidental God existence. Either God exists in all possible worlds or it cannot exist in any of them.

H7. Axiom M1 is equivalent to \(\neg\neg\diamond g\) (two negations cancel each other), to which we can apply M1 to get \(\neg\Xi\neg g\).

H8. From H7 we know that term \(\Xi\neg g\) in H6 cannot be true. So the only possibility for H6 to be true (and we know that it is true because we proved it) is that \(\Xi g\) is true. This shows that God necessarily exists which completes the proof.

Does this really prove that God exists? It depends if you agree with Hartshorne’s axioms and axioms of modal logic. In fact, this proof doesn’t even say what “God” is. It can be applied for any statement which matches the same set of axioms. For example, substitute “God doesn’t exist” for \(g\) and the same reasoning would apply for the proof that God doesn’t exist.

Also, it is interesting to note that axiom HA2 is equivalent to statement \(g \rightarrow \Xi g\) (using implication contrapositive and M1) which reads as “if God exists in our world, it must exist in any possible world”. This statement looks much stronger than the original axiom and seems to be less probable to be true, even though these two formulations are totally equivalent! Actually, Hartshorne used exactly this “less pessimistic” axiom, but I chose to change it to make it look less shaky at least initially.

Can this proof be considered mathematical? Once again it depends on what you call “mathematics”. It definitely doesn’t use any advanced mathematical concepts and is quite trivial. The only bit of mathematics here is the usage of formulas and a strict agreement on axioms and what logical derivations we consider as acceptable. This is what mathematics is actually is: it is just an agreement on rules which we use and a convenient language which helps our reasoning. Exactly the same argument could be repeated without any formulas, but then it would be much more difficult to follow. In fact, originally a kind of this proof was presented informally by Anselm in XI century and only in 1962 it was made formal by Hartshorne (although honestly, I find the relation to be quite loose).

Does this proof have any value? Of course! It is at least a nice exercise in modal logic and an interesting model you can think of. As was discussed in the post on models, even if some model is not correct, it is still able to give food for thoughts. In this particular case, a formal model made obvious how shaky Anselm’s argument was (which was clear anyway, but this model makes the proof and the vision of its weaknesses accessible to non-philosophers).

Now after a short warm-up with modal logic we are ready to discuss Godel’s proof. In comparison to Hartshorne’s proof, this proof also tries to explain what God actually is. Informally speaking, Godel assumes that each imaginable object has a set of properties and is uniquely described by that set (here I don’t assume any definition of “set” in a set-theoretic sense). This reflects what physicist do when they introduce some object. For example, in order to explain what “electron” is you have to specify its properties: its charge, mass, diameter etc.

Each property is binary: an object can either have a property or not, it cannot have some degree of a property. If object \(x\) has property \(\phi\) we denote it as \(\phi(x)\). This is not limiting at all since properties can be arbitrarily precise (like “an object is 1.23 nanometers in diameter”), but just a consequence of the syntax we use. Also, some of the properties are “positive” and some are not. If \(\phi\) is positive we denote it as \(P(\phi)\). What “positive” means is not specified, but God is such an object that has all positive properties.

It is possible that for some property \(\phi\) there is no object which has this property (consider property “this object does not exist”). If this is not the case and there are some object \(x\) such that \(\phi(x)\) holds, we call \(\phi\) instantiable and \(x\) an instance for property \(\phi\).

In the following, I’ll follow the notation and statements order of Wikipedia.

Axiom 1. \((P(\phi)\land \Xi \forall x . \phi(x)\rightarrow \psi(x)) \rightarrow P(\psi)\)

Formula \(\forall x . \phi(x)\rightarrow \psi(x)\) means that property \(\psi\) follows from property \(\phi\): any object which has property \(\phi\) will also have property \(\psi\). We can also call \(\psi\) a subproperty of \(\phi\). As an example consider properties “Bigger than 1 meter” and “Bigger than 1 millimeter”. The second clearly follows from the first one. What Axiom 1 says is that from positive property only positive properties can follow. It cannot be the case that some positive property implies a negative one.

Axiom 2. \(P(\neg\phi)\leftrightarrow\neg P(\phi)\)

For any property \(\phi\) we have an opposite property denoted by \(\neg\phi\) which reads just as absence of \(\phi\). Axiom 2 says that if the presence of some property is a positive thing than absence of it is not and vise versa. It means that we cannot have neutral properties (or we just don’t consider such properties).

Theorem 1. For any positive property it may exist an object which has this property (or in other words, any positive property is instantiable): \(P(\phi)\rightarrow \diamond \exists x . \phi(x)\)

Proof: Let’s assume that property \(\phi\) is non-instantiable. Then statement \(\phi(x)\) is always false and any statement follows from it by paradoxical implication. So we have statement \(\forall x. \phi(x)\rightarrow \psi(x)\) for arbitrary property \(\psi\). This means that all of the properties are subproperties of \(\phi\) together with their opposites. If \(\phi\) was positive that would contradict Axiom 2. This concludes that no positive property can be non-instantiable. QED

Definition 1. \(G(x)\leftrightarrow \forall\phi . (P(\phi)\rightarrow \phi(x))\)

\(G\) is a property which we’ll call “God-like”. This definition says that a property of being God-like implies having all positive properties (and no negative properties, otherwise it would contradict Axiom 2).

Axiom 3. \(P(G)\) (being God-like is a positive property).

Theorem 2. \(\diamond \exists x . G(x)\) (it is possible that God-like object exists at least in some world).

Proof: This theorem is trivial since God-likeness is a positive property and we know that all positive properties are instantiable by Theorem 1. QED

Definition 2. \(\phi \;\mathrm{ess}\; x \leftrightarrow (\phi(x)\land (\forall \psi . \psi(x)\rightarrow \Xi \forall y . \phi(y)\rightarrow\psi(y)))\)

This defines the essence of an object. Informally speaking, the essence of an object is a property which determines an object uniquely up to a world. Let’s say that \(x\) and \(y\) are the same objects but in the different worlds. We cannot write \(x=y\) because these two objects are still not exactly the same–they live in different worlds. But if you had a way to bring this objects in the same world, they would become indistinguishable and effectively the same physical entity. In order to capture this idea, Godel introduced “essences”–properties which describe objects uniquely within the world.

Given this informal reasoning, you now can understand what the formal definition means. It just says that if some property \(\phi\) is the essence of an object \(x\), then any other object \(y\) which has the same essence \(\phi\), would have exactly the same set of properties in any world imaginable. As I said before, the idea here is that properties describe objects uniquely, but belonging to a world is not a property which can be used to identify objects across the worlds. Definition of essence does the trick.

Axiom 4. \(P(\phi)\rightarrow \Xi P(\phi)\) (if some property is positive, it is positive in all imaginable worlds).

Theorem 3. \(G(x) \rightarrow G\;\mathrm{ess}\; x\) (if \(x\) is God-like, then \(G\) is its essence)

Proof: We know by definition that being God-like determines all of the properties of an object uniquely (it is all positive properties). Since we know that positivity of a property is the same in all of the worlds (by Axiom 4), it means that all of the God-like objects have exactly the same properties in all of the worlds. This is what essence is all about (although it might require some reflection). QED

Definition 3. \(E(x) \leftrightarrow \forall \phi .( \phi \;\mathrm{ess}\; x \rightarrow \Xi \exists y . \phi(y))\)

Property \(E\) is called necessary existence. If object \(x\) necessarily exists, it means that in any imaginable world exists some object with the same essence. It is instructive to compare this with \(\Xi\) operator. While \(\Xi\) tells that some logical statement is true in every possible world, \(E\) makes a statement that some object exists in every possible world with exactly the essence of a given object.

Axiom 5. \(P(E)\)

This is another controversial axiom which says that necessary existence is a positive property. One way to think about it is that positivity of a property is not about being “better”, but about being “greater” in a sense of being bigger, smarter, more powerful, containing more energy, having bigger speed etc. These examples are arbitrary, nothing actually says that being “greater” cannot mean “smaller in size”, but if we continue this intuitive idea, then we can think that the more worlds the object can exist in, the greater it is. This is what this axiom is about.

Theorem 4. God exists (more precisely, in any world exists a God-like object): \(\Xi\exists x. G(x)\)

Proof: Since a God-like object has all positive properties, it also has the property of necessary existence, which gives us formula $$\forall x.G(x)\rightarrow E(x)$$

If we combine this with theorem 2 we get $$\diamond \exists x. G(x)\land E(x)$$

By definition of essence $$\diamond \exists x. G(x)\land \Xi\exists y.G(y)$$

If some object has some two properties, it definitely has one of them, so we get $$\diamond \exists x. \Xi\exists y.G(y)$$

or, since \(x\) is not used in this formula anymore, we drop it and get $$\diamond\Xi\exists y.G(y)$$

Applying axiom of modal logic M5 we get the statement of the theorem. QED

Did this prove that the God really exists? Not really. At least not in a popular religious way. Next time when you see in a newspaper that “scientists proved that God exists”, you know what scientists really did and how it is related to the God which you or your neighbor worship. Most likely, not related at all. And at best it is not more than a language play by formal rules.

Once again, even though this proof doesn’t tell us much about the reality in which we live and that each of the axioms can be criticized, I find this proof is still quite elegant. The play with essences is smart and may be even used in absolutely unrelated modal logic derivations.

In order to get a better feeling of this proof and modal logic I suggest you to solve the following exercise:

Exercise. Let’s assume that definition of essence is formulated as

$$\phi \;\mathrm{ess}\; x \leftrightarrow (\forall \psi . \psi(x)\rightarrow \Xi \forall y . \phi(y)\rightarrow\psi(y))$$

and all of the axioms are still the same. Prove that in this case, Godel’s axioms are self-contradictory. (Hint: start by proving that a non-instantiable property is the essence for any object and get a contradiction from this).

Gorky’s description of a mathematician

I was reading “My Universities” by Maksim Gorky yesterday, and I found a curious description of a mathematician who was his neighbor in 1884:

[…] Three rooms opened on the corridor, two occupied by prostitutes and the third by a consumptive mathematician, tall and cadaverous, and topped with coarse, red hair. Through holes in the dirty rags that barely covered him, one shuddered to see his corpselike ribs and bluish skin.

It would appear that the mathematician sustained himself chiefly on his own nails, which he chewed down till they bled. Day and night he was at his formulas, mumbling numbers, and coughing with dry rasps. The prostitutes were uneasy about him, considering him insane, but out of pity they put bread, tea and sugar outside his door. As he carried them in he snorted like a tired horse. If, for some reason, these gifts failed to appear he would stand in the doorway croaking indignantly, “Bread! Bread!”

In their dark caverns his eyes gleamed with maniac pride, happy in the assurance of his greatness. He had an occasional visitor, a hump-backed cripple with a gray beard, thick lenses straddling his swollen nose, and a sharp smile on his yellow, eunuch’s face. Behind the shut door the two would sit for hours in an incomprehensible silence. But once, late in the night, I was awakened by mad screams from the mathematician, “A prison! I say, a cage, a mousetrap! Geometry is a prison!”

The humpbacked dwarf’s retort was a shrill giggle and a word I could not make out, repeated several times.

“Damn you! Get out!” the mathematician roared.

His expelled visitor stood in the corridor, sputtering and pulling his cape about him, while the mathematician, gaunt and terrifying, in the doorway, his fingers writhing in his hair, shouted, “Euclid’s a fool! A fool! God’s wiser that the Greek. I’ll show you!” And he banged the door so hard things in his room crashed down.

The mathematician’s aim, I discovered, was to prove God’s existence mathematically. He died with the task unfinished.

I think I’ll end up the same way.

Comparative advantage and usefulness of models

Since I still haven’t set up LaTeX, this time we’ll do some economics. Let’s consider two countries, which we will call England and Portugal (any coincidences are accidental). England produces 1 unit of wine per worker per day, Portugal produces 2 units of clothes per worker per day. We can represent this in a table:

England Portugal
Wine 1
Clothes 2

It makes sense that given this setting England will sell wine to Portugal and Portugal will sell clothes to England. Both countries benefit from this. We may also try to draw some simple conclusions from this. For example, one may suggest that the price of a unit of wine would be approximately twice the price of clothes.

It is easy to see this. Let’s assume that the price of a unit of wine is $50, but the price of a unit of clothes is $20. It means that one worker in Portugal makes $40 a day, but one worker in England makes $50 a day. There is an incentive to move from Portugal to England to make $10 more, so if this move is easy enough, quite soon not many people would produce clothes, so the price for clothes will go up. We can make a similar argument if the price of wine would be less than twice the price of clothes.

The reality is more complex than this model. We have to take into account not what one particular worker can produce, but what a whole factory can produce. We also have to consider the cost of production, taxes, salaries and many other factors. All of these doesn’t change the theoretical output of the model much: substitute “a worker moved” to “company moved its business” and you will get exactly the same argument about price balance as above. I’ll keep talking about the productivity of separate workers below just to make an explanation a little bit more accessible, but all of what I am going to say can be made more precise.

Let’s make our model more complex. Now both countries produce wine and clothes (here and below we assume the equal quality of products), but the productivity is not equal:

England Portugal
Wine 1 3
Clothes 3 2

I think this is obvious that in this case England would sell wine to Portugal and Portugal would sell wine to England. At this point, prices will be considerably more complicated, but we are not interested in it. The only important point here is that both countries would trade for the mutual advantage.

Let’s consider this situation:

England Portugal
Wine 1 5
Clothes 3 4

Now Portugal is much better at producing both goods. There is no way for England to export products since both wine and clothes are more expensive to produce in England than in Portugal. At the same time, Portugal could sell its products to England, but England wouldn’t be able to afford to pay for these products. If you want to buy something, you first have to sell something, and as we understood, England is not selling anything. It seems that England is in a dreadful position where either all of the business would move to Portugal or the trade between these countries will be banned and England would just have higher prices and lower quality of life.

In 1817 David Ricardo rightfully challenged the argument which I proposed above with the following example (he used England and Portugal as well). Let’s say that one worker in Portugal switches from producing clothes to producing wine. Portugal’s market will have 4 units of clothes less, but 5 units of wine more. Then two workers in England switch from producing wine to producing clothes. After this England has 6 additional units of clothes and 2 units of wine less.

Now, these two countries can actually trade. If Portugal sends 3 units of wine to England, while England sends 5 units of clothes to Portugal, both countries benefit from this trade. See in a table:

England Portugal
2 workers switch: 1 worker switch:
-2 wine, +6clothes +5 wine, -4 clothes
Sell 5 clothes to Portugal: Sell 3 wine to England:
+1 wine, +1 clothes +2 wine, +1 clothes

It is surprising, isn’t it? We’ve made an additional 3 units of wine and 2 units of clothes for these two countries to share by using trade in a situation where any trade seemed impossible.

Here is what’s going on. It is true that Portugal has an advantage in each of the industries,  but the relative productivity of producing clothes is still higher in England than in Portugal. England is 3 times more efficient in producing clothes than wine, but Portugal is just 1.25 times more effective in producing wine, so switching from producing wine to cloth in England is more efficient than switching from cloth to wine in Portugal. This difference in productivity actually makes it feasible for these countries to trade for a mutual benefit.

This argument was widely used as a ground for international trade. I hope that I was able to convince you that this is actually beneficial. Now let’s look at the slightly more realistic example:

Somalia Switzerland
Computers 1 5
Potatoes 3 4

Now it should be clear that comparative advantage would make the poor country focus on agriculture, while the developed country will focus on hi-tech. This way the poor country has no chances of getting out of poverty, while the rich country is only getting richer. The trade deal which looked perfect in the case of England and Portugal above became a disaster when applied to poor and rich countries.

What does the Ricardian model teach us? Is it good to trade or is it bad? Does the model of Ricardo even work? In most situations it doesn’t give any concrete answers (although it still can be used by politicians, those folks can use any kind of tricks anyway) as well as it doesn’t model well real-life situations. It has too many implicit assumptions which make it difficult to apply. First of all, the model considers only two countries and two products. It doesn’t take into account the demand for these products, cost of transportation, competition on the local market and dozens of other factors. There were many attempts to make this model more realistic, but I am not sure if any of these attempts could be called successful.

Should we throw this model away? Not exactly. Even though it doesn’t work quite well, this model is non-obvious and no one before Ricardo was able to think about the trade from this point of view. Even though this model doesn’t give us many answers, it motivates us to ask many interesting questions, which weren’t asked before this model was proposed. This model fueled follow-up research for centuries. It still can be applied in some very limited context, since this model is still logically sound. And some people think that the model still can be refined to the point when it can be applied directly.

Here’s an important lesson: the good model is not necessarily the one which reflects reality well. Other not so easily measured qualities of a model such as the ability to raise new questions may be even more important than the correctness of the model itself.

This applies not only to mathematical models. Newtonian mechanics is incorrect and works only under very specific conditions of ridiculously low velocities of Earth life and only within measurement error. This doesn’t make Newtonian mechanics bad–it has actually proved to be quite useful for almost all of the engineering so far.

Another example is Godel’s proof of god’s existence. The proof was based on so shaky foundations that Godel himself didn’t even try to publish it during his lifetime. And even though the foundation of this work appeared to be incorrect, this proof inspired serious development in mathematical logic and even affected research of Artificial Intelligence.

The most recent example of this kind which actually motivated this post is Theodore Hill’s  “Evolutionary Theory for Variability Hypothesis”. In short, the model explains mathematically that there are more idiots as well as geniuses among males than among females. The work is based on extremely shaky foundations and the model is incorrect for sure at least in case of the human population, but it doesn’t make the model bad! Actually, quite opposite: the model is still logically sound, the shaky assumptions can be potentially refined in the future and even if it appears that the model cannot be fixed at all, it can potentially raise questions which weren’t asked before.

Unfortunately, in this particular case, the refinements wouldn’t be made and questions wouldn’t be asked because the paper was effectively banned. What is so wrong with the model that it has to be banned? All the arguments are purely political: this result can be used by right-wing propaganda, the result can discourage women from going into tech, after all, it doesn’t match with the idea that men and women are equal. No purely scientific arguments against this paper were presented, but the political ones were enough for the ban. Unfortunately, this is not the first time when scientific results got banned based on political grounds. This is harsh to say, but I cannot resist comparing this situation to the suppressed research of genetics, informatics and other fields in the Soviet Union and with Aryan Physics in Nazi Germany. I would really like to provide other examples instead of these two, but it seems that in the modern world other examples of banning science just doesn’t exist. The case of Hill’s model is just the third such case in modern history. I hope this situation wouldn’t go so far, but the direction which the western academy has taken is worrying.

The hardest logic puzzle ever

Imagine you have a conversation with three gods. One god always tells the truths. Another god always lies. The third god gives random answers. These gods can answer any of your question which admits a yes or no response. The problem is that you don’t know which god is lying, which tells the truths and which is random. Moreover, the gods answer in their own language, in which “yes” and “no” sounds like “da” and “ja”, but you don’t know the translation of these words. By asking as few questions as possible you have to find out which god is random, which god tells the truths and which god always lies.

Take your time to try to solve this problem. Then read on.

First of all, the problem says that we have to solve it using as few steps as possible. It appears that we can estimate the number of required questions even without knowing the questions themselves. We have three gods and three roles for them. The number of different combinations of roles is just a number of permutations of these roles, which is 3!=6. At the same time if you asked n yes/no questions, then you may distinguish between 2n different situations. We need 2n to be bigger than 6, which gives us that we need at least three questions to determine the roles of the gods.

So, what these three questions should be? This is a hard problem and we will solve it in several steps:

  1. We will first solve the problem in the assumption that there is no random god and that the gods answer in English.
  2. After this, we will assume that they answer “da” and “ja” instead of “yes” and “no”.
  3. Eventually, we will solve the initial problem with a random god.

Each of these steps is much easier to solve and I advise you to stop here and try to think about the problem again.

So, we have two gods. One is always lying and the other is always telling the truth. Let’s say that S is some statement and we want to learn truthfulness of it. It appears that we can do it by asking only one question to an arbitrary god.  To understand how to formulate the question, consider first the following truth table:

T S desired answer
0 0 0
0 1 1
1 0 0
1 1 1

Here I denoted by T the fact that we talk to the god which tells the truths and by S, as above, the actual truthfulness of the statement which interests us. 0 means “false” and 1 means “true”. Example: if we talk to the god which tells the truths and the statement S is false, then it matches the third line of a table and the desired answer is “no” (denoted in the table as 0).

The problem is that in the first two rows instead of the desired answer we will get an inverted answer since we are talking to the lying god. But if we ask our question not about S, but about some other statement Q1, which has inverted truth values in the first two rows of a table, we will get the desired result:

T S Q1 answer we get
0 0 1 0
0 1 0 1
1 0 0 0
1 1 1 1

As you can see, in this case, if we ask a question about the truthfulness of Q1, we instead get the correct answer about the truthfulness of PQ1 just means that both statements T and S are logically equivalent, so our question to a god may sound like this: “Could you tell me please, if either both statements ‘you always tell the truth’ and ‘S’ are true, or both of them are false?” You can do your calculation to make sure that the response to this question always matches to the truth value of S, no matter which god you ask this question.

The second problem requires just a slight extension of this approach and you may want to try to do it yourself before moving on.

So, now instead of “yes” and “no” we have responses “ja” and “da” with unknown meaning.  We can make an assumption that “ja” means “yes” (I call this assumption A below) and extend our truth table the same way as we did before:

T A S Q1 Q2 answer we get
0 0 0 1 0 da
0 0 1 0 1 ja
0 1 0 1 1 da
0 1 1 0 0 ja
1 0 0 0 1 da
1 0 1 1 0 ja
1 1 0 0 0 da
1 1 1 1 1 ja

Here Q1 has the same meaning as before and Q2 inverts the value of Q1 in case “ja” means “no”. This may be a little bit difficult to digest at first, but the idea here is quite simple. We assume that “ja” means “yes” and we want to hear the response “ja” if S is true.  But if “ja” means “no”, we want a god to revert his response so he still answers “ja” in case of truth, and this is the purpose of Q2.

How to ask the question of the truthfulness of Q2? We can explain to the god how we build the table above and then ask for a value of Q2. The god knows the values of TA, and S and so he knows what number we are interested in. Then he either tells us the truths or lies, using rules of his own language. You may evaluate different possibilities of the answer yourself and see that he will say “ja” if and only if S is true.

Another approach to ask the same question is to ask it in a more direct fashion by using Conjunctive Normal Form of logical statements, but I’ll skip this since it is fairly simple and you can understand how to do it from Wikipedia article.

The question could be formulated much easier. Let’s once again assume that the gods answer “yes” and “no”. How would a god respond to the question “What would you tell me, if I asked you about the truthfulness of S?” The god which tells the truths would give us the correct value of S, this is simple. But the god which lies will be tricked. Let’s assume that S is true. If you ask a question about S directly, the god which lies would answer “no”. But we are not asking about S, we are asking about the answer he would give, so the god will have to lie again and say “yes”, the value of S. The same logic applies if S is false.

It is easy to extend this approach to the case when the gods answer with “ja” and “da”. The question is the following: “Would you say ‘ja’ if I asked you about the truthfulness of S?” You can work out the logic behind this question yourself.

There are two reasons why I presented two possible solutions for this problem. First of all, I was asked this question at a job interview and I was able to come up with a solution with truth tables myself, so I am particularly proud of it (although I didn’t get that job). Another reason is that it is interesting that the first solution was achieved using quite trivial formal logic, while the second one is based on a language play, which cannot be formalized by mathematical methods. This means that, for example, the first solution could be potentially found by Artificial Intelligence, but the second, short and elegant, couldn’t even be represented using the methods of AI available today. This is a fundamental limitation of all of the formal systems which mathematics and AI use in comparison to a “real” logic used by humans.

After solving the second subproblem, we know how to ask for any information from truth-telling or lying god using only a single question. We just have to eliminate the god which gives us totally random answers. I’ll present the questions as though we could ask them directly, but you have to keep in mind that these questions should be “translated” to a form which allows us to get a correct response. The next paragraph contains a solution to the entire problem and you know what to do.

Let’s say that we enumerated the gods as 1, 2, 3. We ask the first god: “Is the second god Random?”. If he answers “yes”, it may be the truth, or it may just mean that the first god gave us a random response. In both cases, we can conclude that the third god is not random and we can proceed to ask questions to him. If the response to the question is “no”, then we know for sure that the second god is not random. The rest is trivial.

In the solution presented above, we didn’t define what “random” actually is, but it can mean different things. This puzzle was initially published by George Boolos and the definition which he used assumed that the random god flips the coin before each answer and the result of this flipping defines if he tells the truth or lies.

At first sight, it may look like a valid definition for randomness, since to the question “Are all roses are red?” we may receive responses “yes” or “no” with equal probability. At the same time, to the question “What would you tell me if I asked you whether all the roses are red?” he would give the response “no”. He flips the coin to decide if he tells the truths, but in both cases, he got caught into the trap of our indirect question. “Yes” is impossible and he doesn’t look like random anymore. The solution I presented above works for a truly random god as well, but the fact that the definition of randomness with flipping a coin doesn’t make him truly random is remarkable.

The last bit. We assumed that even if we don’t know the meaning of the words “ja” and “da”, we at least know what possible responses might be. This is a lot of information already! What if we didn’t know even a single possible word in god’s language? It is easy to solve the problem in four steps. First, ask an arbitrary question to learn at least one word from god’s vocabulary. Then apply the strategy derived above. This straightforward solution is easy, but is it possible to solve the problem with fewer steps?

If one of the gods is truly random (without flipping a coin), then the solution in three steps is impossible. It is clear that the last question can be asked only to a non-random god. If you think about it for a bit, this observation is also quite remarkable: we cannot get any information by asking the last question to the random god, but asking a question to the random god the first question is actually giving us information on who is not random (we used this trick in our initial solution). This is quite difficult to explain without going into depths of theory, but a short story is that this happens because a randomized answer can only reduce uncertainty, but cannot give you precise information.

So we have to determine which god is not random by using only two questions. The first response is some word which we’ve never heard before, so it cannot give any information. It means that the second question cannot depend on the response to the first question. If you add to the mixture true randomness, it appears that even though we can get some information from these two responses, we cannot confidently eliminate a random god. Details, unfortunately, are quite complicated and I will skip it.

If the random god is defined by flipping a coin, the solution in three steps is possible. The first two questions are the following:

  1. First ask the first god: “Let’s consider English transliteration of words ‘yes’ and ‘no’ in your language. If I asked you whether you are random would you answer with a response which goes alphabetically first in the list of transliterations?”
  2. Then ask the second god: “Would you give me the same response to the same question which I just asked?”

I leave it as a nice exercise to understand how you can make sense out of responses to these two questions and how to complete the puzzle in this case.

Monty Hall problem

I’ll start my blog with the Monty Hall problem. It is a well-known classic problem and quite a trivial one, but I still want to present it here, because for me personally, it means a lot. It was one of the first “paradoxes” I learned in my life and it has greatly affected my interests and mathematical taste, so I have some special kind of respect for this problem.

Imagine that you are on a TV show in front of three closed doors. There is a car behind one of the doors and if you pick the right door, you win the car. After you select the door, the host doesn’t open it immediately but instead opens another door without a prize. Now you have a choice out of two doors and you are given a chance to change your initial guess. Would it make sense to change the door or not?

This problem doesn’t need any maths knowledge except basic arithmetic. If you don’t know this problem yet, stop reading here and try to solve it on your own.

One incorrect response I saw is the following. We have three doors and the car is behind one of the doors. So the chance to win is 1/3. (This is correct). Since each door has the same chances of having a car behind, it doesn’t matter if you change the door.

In response to the idea above one may propose another incorrect reasoning: after one losing door is eliminated, each of the remaining doors has the probability of 1/2 of being winning. So it is still doesn’t matter if you change the door, you always get 1/2 chance of winning. Can you see what’s wrong with this solution?

The correct answer is the following: if you don’t switch the door, then the probability of winning is 1/3. If you switch the door, the probability is 2/3.

One way to see how this works is to consider the probability that your initial choice is incorrect, which is 2/3. The phrase “the door I’ve chosen is incorrect” may be rephrased as “one of the doors I haven’t chosen is correct”. This probability stays the same after the host eliminates one of the doors. After this you know that the remaining door is “the door I haven’t chosen”, and the probability it has the car is the same that you initially made a mistake. This is 2/3.

This solution gives the right answer, but it doesn’t give the right intuition. It is not entirely clear immediately why this solution is more correct that the solution which gives us answer 1/2. I’ll try to explain this below.

Before I go to the solution, you can try playing this game several times with your kid. You will be a host and your kid will try to win. Play this game 20 times and check how many times your kid was able to win. Then suggest her a strategy to always keep her initial choice. Play another 20 times. After this, she can realize that she is winning less than before (if she was making choices more or less randomly). If you’re lucky, she would suggest playing again, but this time she changes the door each time. If not, you can suggest this idea yourself (or this time she can be the host and you can demonstrate a good result by switching the door each time). Amusingly, when you or your kid changes the door each time, you may expect the results to be better than in two previous tries. Now you can go to an explanation.

The problem can be trivially explained by enumeration of all of the possible cases. Let’s assume that we’ve chosen the first door. There are three possibilities where the car can be:

1. A car is behind the first door (the one you chose). If we don’t change the door when we have the opportunity, we win. If we change the door, we lose.

2. A car is behind the second door. If don’t change the door, we lose. Otherwise, we win.

3. A car is behind the third door. If don’t change, lose. Otherwise, win.

As you see, if we keep the initial door, we win only in one case out of three. If we change the door, we win in two cases out of three. This gives us probabilities of 1/3 for keeping our choice and 2/3 changing the door.

What if initially we chose the second or the third door? The situation will be symmetric to what we’ve just seen, so it doesn’t affect the probability.

If you know the basics of probability theory, you can easily solve this problem with the law of total probability and see that the solution just repeats the solution above. But if we used this law initially, it wouldn’t look so convincing (at least if you’re just learning basics). This is not a coincidence: probability theory is just about counting possibilities at its heart, and it is always useful to remember this. This often makes reasoning simpler and helps you to distinguish the correct reasoning from incorrect one.

Another approach to take on such kind of problems is to generalize them. Let’s say that instead of 3 doors, TV-show uses 100 doors. Only one has a prize behind. You select one door, then the host opens 98 doors which don’t have prizes. Would you switch the door in this case? The answer should be obvious.

Problem. Let’s say that there are N doors in total, W of them are with prizes. You are allowed to choose one door. After this, the host opens K doors without a prize and you’re allowed to choose another door. What is your probability to win a prize in this case?

This is quite a simple problem if you know about binomial coefficients so I’ll leave it as an exercise. Feel free to post your solution in comments.

Not only the problem itself but also its history is interesting though.

Initially, this problem was proposed by Steve Selvin in 1975, but it became well-known only when a reader of Parade magazine asked this question in the “Ask Marilyn” column. Marylin vos Savant gave a correct answer, but she immediately started receiving letters explaining why she is wrong. Most of these letters were from people with a Ph.D. degree. Here’s a couple of examples, taken from Marylin vos Savant website:

[…] You blew it! Let me explain. If one door is shown to be a loser, that information changes the probability of either remaining choice, neither of which has any reason to be more likely, to 1/2. As a professional mathematician, I’m very concerned with the general public’s lack of mathematical skills. Please help by confessing your error and in the future being more careful.
Robert Sachs, Ph.D.
George Mason University

You are utterly incorrect about the game show question, and I hope this controversy will call some public attention to the serious national crisis in mathematical education. If you can admit your error, you will have contributed constructively towards the solution of a deplorable situation. How many irate mathematicians are needed to get you to change your mind?
E. Ray Bobo, Ph.D.
Georgetown University

I am in shock that after being corrected by at least three mathematicians, you still do not see your mistake.
Kent Ford
Dickinson State University

It seems that Ph.D. status doesn’t guarantee sanity or at least basic knowledge of arithmetic, unfortunately. At the same time, Ms. Savant was also receiving letters from primary school teachers who tried this experiment with young kids. And these kids and teachers did solve the problem correctly and was amused by the result. (My advice above to try it with your kid wasn’t random).

What is more interesting is the difference between confusion which it made in 1990 and the acceptance it has now. This paradox is learned in many schools and universities around the world, and I haven’t heard that it provokes any doubts among the students. The mathematical education hasn’t changed a lot in these 28 years,  so what’s going on here? I am afraid that the answer is that students are just taught to believe that the proof from the textbook is correct, but don’t understand it. If people with Ph.D.s weren’t able to solve the problem, then how can 17 y.o. kids solve it in an exam? This is why I not only present the correct solution but at first give incorrect ones–to sparkle a doubt.

The last bit. As far as I know, originally the problem didn’t mention any particular TV-show explicitly (I may be wrong, because I didn’t find reliable sources). Monty Hall was a host in a real TV-show called “Let’s make a deal!” in the seventies and Steve Selvin wrote a letter to him. I don’t know what was in that letter (if you know where I can read it, please let me know), but the response from Monty became popular and since then the problem has Monty Hall name:

May 12, 1975

Mr. Steve Selvin
Asst. Professor of Biostatistics
University of California, Berkeley

Dear Steve:

Thank you for sending me the problem from “The American Statistician.”

Although I am not a student of a statistics problems, I do know that these figures can always be used to one’s advantage, if I wished to manipulate same. The big hole in your argument of problems is that once the first box is seen to be empty, the contestant cannot exchange his box. So the problems still remain the same, don’t they… one out of three. Oh, and incidentally, after one is seen to be empty, his chances are no longer 50/50 but remain what they were in the first place, one out of three. It just seems to the contestant that one box having been eliminated, he stands a better chance. Not so. It was always two to one against him. And if you ever get on my show, the rules hold fast for you–no trading boxes after the selection.

Next time let’s play on my home grounds. I graduated in chemistry and zoology. You want to know your chances of surviving with our polluted air and water?

Sincerely,
Monty