Based in Sydney, Australia, Foundry is a blog by Rebecca Thao. Her posts explore modern architecture through photos and quotes by influential architects, engineers, and artists.

What is a probability simplex?

A probability simplex is a mathematical space where each point represents a probability distribution between a finite number of mutually exclusive events. Each event is often called a category* and we usually the variable K is used to denote the number of categories.

A point on a probability simplex can be represented by K non-negative numbers that add up to 1. Here are some examples:

A point in a simplex where K=2: (0.6, 0.4)
A point in a simplex where K=3: (0.1, 0.1, 0.8)
A point in a simplex where K=6: (0.05, 0.2, 0.15, 0.1, 0.3, 0.2)


When K=2, this space is a line, when K=3 it is a triangle, and when K=4 it is a tetrahedron. In each case, the simplex is a (K-1) dimensional object. (The requirement that the numbers sum to 1 reduces the dimensionality by 1)**.

The probability simplex is very common in Bayesian inference. For example, suppose we are deciding between hypotheses A, B, and C, thus making our hypothesis space {A, B, C}. Our belief about the relative likelihoods of A, B, or C being true falls on a probability simplex where K=3.

Each “corner” or “vertex” of a probability simplex represents the case where all of the probability is placed on a single category. So for example in the case above with hypothesis space {A, B, C}, the point {A=0, B=1, C=0} represents a belief with all probability placed on B.

If a single hypothesis is ruled as impossible, the probabilities of the other K-1 hypotheses sum to 1, and therefore this “boundary” of the K-simplex is actually a K-1 simplex.

In game theory, a "mixed strategy” among K different potential actions by an agent lives on a K-simplex.

Degenerate and Trivial Cases

When K=0, the simplex consists of points with no coordinates that sum to one. The empty sum is always equal to zero, and thus no such object exists. Therefore, the 0-simplex is the empty space.


When K=1, the simplex consists of points with 1 coordinate that sums to one. The only possible value this can have is (1) - and is therefore equivalent to a 1-point or unit space. In inference, this is the case where there is only 1 working hypothesis, and therefore the observer is forced to place their entire belief on this one hypothesis.

When K=2, the simplex is simply a point on the unit interval [0, 1] where the first coordinate p is the location of the point on that interval and the second coordinate (1-p) is the length to complete the interval. A point on the 2-simplex is just a probability. This is also called a Bernoulli distribution, and is used for statistical inference on binary, or yes-no questions.

Footnotes

*not to be confused with the special mathematical meaning of category in category theory but in terms of a categorical distribution

**In some treatments, this is called a 2-simplex instead of a 3-simplex because it’s 2 dimensional. I prefer to call it a 3-simplex because it is embedded in 3-dimensional space and it also uses 3 numbers to represent it (even though 2 of those numbers would suffice to find the 3rd).

What is Occam's Razor?

What is a categorical distribution?