A categorical distribution is just a probability distribution over a finite number of categories. As one of the simplest distributions, a categorical distribution can be represented by a finite sequence of numbers that add up to 1.
Usually, the number of categories is taken to be K, and can be ordered from 0 to K-1. The ordering does not in general have any meaning in terms of the relationship between categories - but is simply a method to name them. Under this convention, a categorical distribution over a space of K categories can be specified with K non-negative numbers that add up to 1.
Categorical distributions are among the most common probability measures, and can be used to model many situations. One drawback is that no internal structure is given to the categories. If the number of categories is small, simply listing out the probabilities or searching the space is simple, but if it is exponentially large, then that lack of structure could make the space unwieldy.
Unnormalized categorical distributions can be represented by K numbers that don’t necessarily add up to 1. The unnormalized categorical distribution can be used to determine the ratio of 2 probabilities without normalization. Unlike some distributions, the normalization constant can be easily be found by taking the sum of all the values.
A categorical distribution is the most general probability distribution over the finite mathematical space of K objects. The set of all possible categorical distributions of K objects also forms a mathematical space, called a simplex. Points on a simplex are simply K non-negative numbers that add up to 1. When K=2, this space is a line, when K=3 it is a triangle, and when K=4 it is a tetrahedron. In each case, the simplex is a (K-1) dimensional object. (The requirement that the numbers sum to 1 reduced the dimensionality by 1).
A multinomial distribution is similar to a categorical distribution (and can be specified with the same probabilities), except that a certain number of values will be drawn from the distribution at once (say n categories) and the multinomial distribution looks at how the n draws are likely to distribute among the K categories.