Arbitrary distribution using biased coin

Problem. Let’s assume that you have a biased coin, but you don’t know the probabilities of its sides. Using the coin, how would you choose one of the numbers 1, 2 or 3 with equal probability?

Stop reading here and try solving the problem yourself before moving on.

Solution. First of all, I have to note that this problem have a trivial solution that works only for this particular question, not in a general case. One such solution was proposed by a reader in the comments section. In this post, we present a solution that will work for arbitrary distributions, no matter how asymmetric they are.

The problem becomes easier if we split it into two parts. Firstly we will learn how to emulate a fair coin using a biased one. After this we describe a procedure which allows us to get numbers from an arbitrary probabilistic distribution.

Let’s denote the probability of tails as \(p\). Then if we toss the coin twice, we can get one of the four outcomes:

  1. Tails both times, with probability \(p^2\),
  2. Heads both times, with probability \((1-p)^2\),
  3. Tails first time, heads second time, probability \(p(1-p)\).
  4. Heads first time, tails second time, probability \(p(1-p)\).

As you see, outcomes 3 and 4 have equal chances to appear. Thus, in order to emulate an experiment with probability 50/50 (fair coin), we can toss the coin twice. If both tosses give the same outcome, then discard it and toss again. When we get tails in one toss and heads in the other, we choose the outcome which comes first.

Example. Consider a sequence of tosses T, T, H, H, T, H, where I denoted tails as T and heads as H. This sequence describes an experiment in which we toss the coin twice and both times we get tails. We discard this result and toss twice again. This time we get two heads. Now, we toss twice one more time. First, we get tails and then heads. We write “tails” as our outcome.

Since now we can assume that we have a fair coin. We will be repeating tosses sequentially and interpreting them as digits in a binary representation of a number \(s\) in interval \([0, 1]\):

$$s=\sum\limits_{i=1}^{+\infty} \frac{t_i}{2^i}$$

Here \(t_i\) denotes the outcome of i-th toss and can be either 0 or 1. In order to get one of the three values at random, we split \([0,1]\) into three subintervals \([0, 1/3), [1/3, 2/3], (2/3, 1]\). (Since particular points like 1/3 has zero probability, we don’t care into which interval we include them). The outcome of the experiment is the number of the subinterval in which \(s\) lies. Since we assume that each digit of \(s\) is uniform, \(s\) itself is uniform in \([0,1]\).

In fact we don’t need the exact value of \(s\). The only interesting bit is the subinterval in which \(s\) lies, so we don’t have to make an infinite number of tosses. Let’s say that the first toss gave us value 1. We know that \(s=0.1\ldots\) (in binary) and it lies somewhere in \([1/3, 2/3]\). If the next toss gives us 0, then \(s=0.10\ldots\) and \(s\in[1/3, 0.59]\). If the next toss gives us 0 again, then \(s=0.100\ldots\) and we know that \(s\in[1/3, 0.46)\) which is a subinterval of \([1/3, 2/3]\). The outcome is the second value.

This scheme can be generalized in an obvious way. If we want to get an arbitrary distribution we just have to split \([0,1]\) into a given number of subintervals of lengths that match the probabilities of the values.

Infinite sets

In this post, I’ll cover more or less the basic topics which are usually covered in any set theory course. I expect that the theorems presented in this post will be used extensively in my next articles and I will be referencing this post a lot.

I am skipping many formal details getting straight to the main idea of proofs, but all of the proofs can be made strict without much effort. I also present simpler results without proofs in the form of exercises. I am too lazy to prove each bit of trivialities, sorry.

We will start from a seemingly absurd statement.

Theorem. A unit line segment \([0, 1]\) contains exactly the same number of points as a filled unit square \([0, 1]^2\).
Proof. Consider a decimal fraction representation of points of the unit segment and let function \(f\) map these points to a unit square by taking all odd positioned digits of decimal representation to the first coordinate of the square and all even positioned digits to the second coordinate of the square. This cryptic rule can be made clear with the help of an example. Point \(0.1234567\) has odd-positioned digits 1, 3, 5, 7 and even-positioned digits 2, 4, 6 in its decimal representation, so \(f\) will take this point into a point with coordinates \((0.1357, 0.246)\). It is easy to see that this mapping is one to one. QED.

Exercise. As promised, I provide quite informal proofs and sometimes admit holes in them. The proof above has one such hole. Can you see it? Can you resolve it? (Hint: any number has two decimal representations, for example number 0.2 can be also represented as an infinite fraction 0.19999?).

This proof can be easily extended to the case of a unit cube, or even a unit n-dimensional cube. Instead of mapping a point to two coordinates, map it to \(n\) coordinates and use each n-th digit of decimal representation for this purpose.

This result is quite counter-intuitive and may seem even wrong. After all, a line segment has area 0, but a unit square has area 1. A unit square has volume 0, but a unit cube has volume 1. A unit square consists of an infinite number of unit line segments, while a unit cube consists of an infinite number of squares. How can it be that these three figures have the same number of points?

The catch here is in the terminology. When we say “line segment”, “square” or “cube” we immediately visualize geometric objects. The same terminology as in geometry is used in set theory, but set theory lacks all of the geometric sense. Things like “length”, “angle” or “area” are just not defined in set theory and we cannot make any references to them.  Without these notions, set-theoretic figures are not exactly the same as geometric figures, even though we can describe them using coordinates.  In particular, you are not supposed to draw set-theoretic figures on a sheet of paper, since any picture immediately suggests geometric information which doesn’t exist. As soon as you drop all of your geometric intuition, the only remaining way of dealing with these figures is to analyze functions that map points between them.

The same consideration applies to physical intuition. Mathematical objects don’t usually have mass, atoms inside, density etc, so when we stretch objects, they don’t become less dense. As an example, take function \(f:x\mapsto 2x\) which maps interval \([0,1]\) to \([0,2]\). Clearly, \([0,1]\) is a subset of \([0,2]\), but at the same time \(f\) is one-to-one, so each point in \([0,1]\) can be mapped uniquely with a point in \([0,2]\) and vice versa. Thus from one point of view \(f\) stretches the interval two times, since any subinterval is mapped to a subinterval of twice the length (e.g. \([0.1,0.2]\) is mapped to \([0.2, 0.4]\)). At the same time its image is not less “dense” than its preimage, so in this sense the function just doubles the interval. Once again, all of this seems paradoxical only until you appeal to physical intuition. As soon as you accept the idea that sets have nothing to do with geometry or physics, you should be fine with these abstractions.

The preceding paragraph can be summarized as a rule: stretching of an object doesn’t change the number of points in it.

To make this reasoning more clear, consider a set of integers. Are there more integer numbers than even numbers? From one point of view yes, since any even number is an integer, but not any integer is even. At the same time function \(x\mapsto 2x\) provides a way to match each integer with exactly one even number and vice versa. This makes us conclude that the number of integers and even integers is the same.

Which response is correct? In fact, both of them could be correct, depending on how you define “more”. As soon as we leave the world of finite sets, we cannot really say that one set has more elements than another in the same sense as we compare sets of 2 and 3 elements. Looking into subsets is not very productive though. We know that \([0,1]\subset[0,2]\), but at the same time \([100, 101]\not\subset[0,2]\), even though it is the same set moved on a real line. We clearly don’t want moving sets around affecting our reasoning about sizes of sets. Thus, it appears to be more natural when we use one-to-one mappings to compare cardinality of sets and so this definition is usually used.

Definition. Function \(f\) is called injection, if it maps different elements into different elements. Formally, for injective function, if \(f(x)=f(y)\) then \(x=y\).

Definition. We say that set \(A\) is not smaller than set \(B\) (or has not smaller cardinality, or has not less elements) and denote it as \(|A|\le|B|\) if there is an injection from \(A\) to \(B\). We say that set \(A\) has the same number of elements as \(B\) (or the same cardinality, or the same number of elements) and denote it as \(|A|=|B|\) if there is a one-to-one mapping between \(A\) and \(B\).

Definition. We say that set \(A\) has the cardinality of the continuum if there exists a one-to-one mapping between \(A\) and a unit segment. We denote this as \(|A|=\mathfrak{c}\).

Schröder-Bernstein Theorem. If \(|A|\le|B|\) and \(|B|\le|A|\), then \(|A|=|B|\).

The statement of the theorem seems quite intuitive, but given all of the subtlety of infinite sets, this requires a formal proof. I am skipping it, but if you are interested you can check one of the approaches on Wikipedia.

Exercise. Prove that the whole real line \(\mathbb{R}\) has the cardinality of the continuum.

Exercise. Prove that an arbitrarily dimensional space \(\mathbb{R}^n\) has the cardinality of the continuum.

Exercise. Given \(|A|\le|B|\le|C|\) and \(|A|=|C|\), prove that \(|A|=|B|\).

Consider an arbitrary figure in n-dimensional space. If this figure has a subset of continuum cardinality, we end up in the situation of the exercise above. Thus we conclude that any figure in any finite-dimensional space has the maximum cardinality of the continuum.

This result raises an interesting question: is it possible at all to have the cardinality bigger than the continuum? Before answering this question let’s consider a couple of more examples.

Definition. Set \(A\) is called countable if it contains the same number of elements as the set of natural numbers \(\mathbb{N}= \{0, 1, 2, 3, \ldots\}\). This is the same as to say that all elements of \(A\) can be arranged into an infinite sequence. We denote it as \(|A|=\aleph_0\).

The following exercises are elementary and compulsory to be solved before you continue.

Exercise. Prove that union of a countable number of countable sets is countable.

Exercise. Prove that Cartesian product of countable sets is countable.

Exercise. Prove that sets of integers and rational numbers are countable.

Exercise. Prove that a set of finite words built from a finite alphabet is countable.

Exercise. Prove that a set of polynomials with integer coefficients is countable.

All of the statements in the exercises above are quite simple, but at the same time important. I think that you will be able to solve these quite easily, otherwise just try to search for the solution on the Internet.

From the exercises we see that products and unions of countable sets are always countable. All of the fractions make up a countable set as well. This should make us suspicious if non-countable sets exist at all. In particular, is continuum countable? The answer to this question is negative. The traditional way of proving it is to employ “Cantor’s diagonal argument”, one form of which I present here.

Cantor’s Theorem. For an arbitrary set \(A\), its powerset (i.e. a set of all subsets, we denote it by \(2^A\)) has a strictly bigger cardinality than \(A\).

Proof. We will prove this theorem by contradiction. Let’s assume that there is a one-to-one mapping \(f: A \to 2^A\). Let $$X=\{x\in A | x\not\in f(x)\}$$ Since \(f\) is one-to-one and \(X\) is a subset of \(A\), we know that there is \(a\in A\) such that \(f(a) = X\). If we assume that \(a\in X\) than by definition of \(X\) we get \(a\not\in f(a)=X\) which is a contradiction. The assumption that \(a\not\in X\) leads to the same contradiction. Thus, both possibilities are impossible and our initial assumption that one-to-one \(f\) exists is false. We conclude that there is no one-to-one mapping between \(A\) and \(2^A\). QED.

If \(|A|=a\) then for cardinality of all subsets of \(A\) we will write \(|2^A| = 2^a\).

Theorem. \(\mathfrak{c} = 2^{\aleph_0}\).

Proof. Any number within a unit segment can be represented as a fraction between 0 and 1 in a binary number system (e.g. any number can be written as something like 0.11001001110…) If we match each digit in this binary representation to a number in \(\mathbb{N}\), then we see that we can map each number within a unit segment to a subset of \(\mathbb{N}\). By Cantor’s theorem we see that the cardinality of a unit segment has to be bigger than the cardinality of \(\mathbb{N}\). QED.

From here we can immediately make some practical statements. The first two of them demonstrate why it makes sense to compare sets using one-to-one mappings.

Statement 1. There are real numbers which are not solutions to any of the polynomial equations with integer coefficients (numbers which appear to be solutions are called algebraic and numbers which cannot be solutions are called transcendental; this statement says that transcendental numbers exist).

This statement follows immediately from the fact that the set of all polynomials is countable, while the set of real numbers is uncountable.

Statement 2. There are numbers such that we cannot write a computer program which prints all of the digits of this number sequentially (such numbers are often called non-computable, although terminology may vary). Note that I am not saying anything about the time which the program takes for computation. This is the statement only about the theoretical possibility of such a program, even in the case when printing the digits will take an infinite amount of time.

This is actually the same statement as statement 1. Each computer program is just a finite string made up of symbols of a finite alphabet and as you have proved above, sets of such strings are always countable. The proof of this fact mirrors the proof for the number of polynomials. Please note that this result is applied irrespectively to the programming language, formal logic or the computer architecture we use. Until programs are strings, non-computable numbers will exist.

Statement 3. Set of all sets cannot exist.

If a set of all sets could exist, it would be “the biggest” set of all, since it has to contain all the elements from all of the sets. Since we know that for an arbitrary set \(A\) its powerset \(2^A\) is “bigger” than \(A\), we conclude that there cannot be “the biggest” set.

The last three statements are simple enough to be proven as an exercise.

Statement 4. Any set of intervals on a real line is at most countable.

Statement 5. Any real function can have at most a countable number of extremums.

Statement 6. Any real increasing function can have at most a countable number of discontinuities.

In fact, there is nothing special about increasing functions. In one of the following blog posts we will generalize the last statement to all functions \(\mathbb{R}\to\mathbb{R}^n\) which have only discontinuities of the first kind. But this is a topic for a separate blog post.

Impossible puzzle

The following story took place in the Soviet Union, where children started their studies at school at the age of 7. It is important to know this for the solution of the problem.

Two mathematicians who hadn’t seen each other for years met in the park and had the following conversation:

“Hi! How are you? Do you already have kids?”

“Hi! I’m fine, thanks! Yes, I have two kids, both of them are still preschoolers.”

“How old are they?”

“The product of their ages is the number of pigeons around that bench.”

“Hmm. I don’t have enough information to learn their ages.”

“The older takes after his mother.”

“Oh, now I see!”

Question: What are the ages of the children?

I have to admit that when I was given this problem in my teenage years, I wasn’t able to solve it. When I gave up, I was given a hint: “The order of the phrases in the dialogue is important”. This allowed me to solve the problem within minutes. I am not publishing the solution, but instead, I’ll present a set of similar problems below the statements of which can also be used as a hint. Feel free to share and discuss the solution in the comment section though.

Many years after I was given this problem, I learned that it is a simplification of a much harder mathematical puzzle. Here it is.

Alice thought of two distinct numbers, both bigger than 1. She told the sum of the numbers to Sam (this sum appeared to be smaller than 100) and their product to Peter. Then the following conversation between Sam and Peter occurred:

“I know that you don’t know the numbers which Alice thought of,” told Sam to Peter.

“Now I know them,” Peter answered.

“Now I know them too,” said Sam.

Question: What are Alice’s numbers?

This problem is much harder than the puzzle with children’s ages stated above. I was able to solve it only by writing a program which enumerated all of the possibilities, but I know that many people do it manually using simple arithmetic tricks and tedious bookkeeping. If you cannot solve the problem yourself you can check the solution in Wikipedia. Note that the condition for the sum of Alice’s numbers to be less than 100 is critical. It is clear that a too wide range would make the solution non-unique, but what is interesting is that if the range is too small (if the sum is less than 62) the solution also cannot be found. It is quite difficult and instructive to analyze how the number of solutions depends on the allowed range for the sum.

If this problem still appears too difficult, you can also check “Cheryl’s Birthday” problem from Asian Schools Maths Olympiad. It is just another variation of the same problem of less complexity. (It seems that this is the easiest version of all known to me).

There is one more modification of the same problem which is my favorite.

Alice made up three positive natural numbers. She told the sum of these numbers to Sam and the product of these numbers to Peter. Then the following conversation happened:

“If only I knew that the number which you were given is bigger than the number which I was given, then I’d be able to immediately tell what numbers Alice made up”, said Sam.

“No, the number which Alice told me is smaller than yours, and I know all of the numbers”, Peter responded.

Question: What numbers did Alice make?

This problem is not as immediately straightforward as the problems above, although it is much easier computationally. Once again I am not providing the solution, but you can discuss it in the comments section.

For the reference, the answer to the last problem is numbers 1, 1 and 4. It is quite easy to find these numbers, but you should also be able to demonstrate clearly that these numbers are the only solution which is possible. This problem was given in one of the Russian Math Competitions (unfortunately, I’ve lost the reference) and if the correct answer was given, but the uniqueness of the solution wasn’t shown, the solution wouldn’t count.