Black Swans


3 points

k_klapsas hasn't added a bio yet


You find $p(x_i | x_1, ...,x_{i-1}) $ which is a supervised problem since for each set of $x_1, ..., x_{i-1}$ you know the value of $x_i$ from the dataset.

An example of a categorical distribution is the rolling of a dice which has 6 possible states with probability 1/6 each. But the sides of the dice don't have to be labelled 1-6 they could be e.g. red,blue etc. In that case it doesn't make much sense to take the expected value. Another example common in machine learning is the distribution over the classes that an object might belong.

You should check out this blogpost: Some layers are quite interpretable and some others are pretty much black boxes