What’s the difference between the Sigmoid and Softmax activation functions?

Mike Wang
2 min readJun 1, 2019

Logistic Regression is a standard technique used for Classification problems. In Logistic Regression, a Sigmoid or Softmax function is applied to Linear Regression to solve classification problems. In this post, I attempt to answer:

  • Which activation function should you choose: Sigmoid or Softmax?

1. Which activation function should you choose: Sigmoid or Softmax?

These activation functions normalize the input data into probability distributions between 0 and 1. This property is quite useful for classification problems where the output represents the probability of the input being 1 of 2 classes.

Sigmoid Function

The Sigmoid function is an S-shaped function between 0 and 1 defined by the equation below:

The Sigmoid Function

Softmax Function

The Softmax Function normalizes a set of K real numbers into a probability distribution such that they sum up to 1. As the name suggests, Softmax is a soft version of the max() function.

The Softmax Function

For K = 2, the Softmax function is equivalent to the Sigmoid function. In general, the Sigmoid function is used for binary classification problems and the Softmax function is used for multi-class classification problems.

--

--

Mike Wang

Hi there, I write and teach about cool and interesting Engineering topics