Why Do We Need the Softmax Function?

What Are Logits?

How Do We Convert Logits to Probabilities?

Why Exponentials?

One Possible Solution: Sigmoid

Another Possible Solution: Laplace CDF

Practical Applications

Numerical Stability Concerns