Multinomial logistic regression is the model you reach for when the outcome has three or more unordered categories — like predicting whether an animal is a cat, a dog, or a bird — and it works by generalizing ordinary binary logistic regression with the softmax function. This guide explains the math, walks through a worked example, and shows exactly when to use it.

What is multinomial logistic regression?
Multinomial logistic regression (also called multiclass logistic regression or softmax regression) extends logistic regression to problems with more than two possible outcomes that have no natural order. Binary logistic regression answers a yes/no question; the multinomial version answers a “which one of these $K$ classes?” question. Classic examples include predicting the chosen transport mode among {car, bus, train}, the species among {cat, dog, bird}, or which product a customer buys from a catalogue of three.
The crucial requirement is that the categories are unordered (nominal). If the categories have a ranking — like {low, medium, high} — you should use ordinal logistic regression instead, which exploits that ordering. For the two-class case you simply fall back to binary logistic regression.
The softmax generalization
In binary logistic regression a single linear score is squeezed through the sigmoid. Multinomial logistic regression keeps one linear score per class and squeezes the whole set through the softmax function, which turns $K$ raw scores into $K$ probabilities that are all positive and sum to one:
$$P(y=c\mid x)=\frac{e^{z_c}}{\sum_{j=1}^{K} e^{z_j}}, \qquad z_c=\beta_{c0}+\beta_{c1}x_1+\beta_{c2}x_2+\cdots$$Here each class $c$ gets its own set of coefficients $\beta_{c0}, \beta_{c1}, \dots$, so $z_c$ is the linear score that class earns for a given input $x$. The exponentials make every score positive, and dividing by the sum $\sum_{j=1}^{K} e^{z_j}$ normalizes them into a valid probability distribution. The predicted class is simply the one with the highest probability.
Binary vs multinomial vs ordinal
The three flavours of logistic regression differ only in how many classes they handle and whether order matters. Here is the comparison in one table:
| Model | Number of classes | Ordered? | Core mechanism |
|---|---|---|---|
| Binary logistic regression | Exactly 2 | N/A | Sigmoid on one linear score |
| Multinomial logistic regression | 3 or more | No (nominal) | Softmax over $K$ linear scores |
| Ordinal logistic regression | 3 or more | Yes (ranked) | Cumulative log-odds with shared slope |
Worked example: predicting a customer’s chosen product
Imagine an online store with three products — A, B, and C — and you want to predict which one a customer buys from their age ($x_1$) and income ($x_2$). The outcome has three unordered categories, so multinomial logistic regression is the right tool. We pick product A as the reference class, so its coefficients are fixed to zero, and we estimate coefficients for B and C relative to A.
Suppose fitting the model gives these linear scores for a customer with $x_1=35$ (age) and $x_2=6$ (income in tens of thousands):
- Reference class A: $z_A = 0$ by construction.
- Class B: $z_B = \beta_{B0}+\beta_{B1}x_1+\beta_{B2}x_2 = -2.0 + 0.04(35) + 0.10(6) = 0.0$.
- Class C: $z_C = \beta_{C0}+\beta_{C1}x_1+\beta_{C2}x_2 = -4.0 + 0.02(35) + 0.50(6) = -0.3$.
Now apply softmax. The denominator is $e^{0}+e^{0}+e^{-0.3}=1+1+0.741=2.741$, so:
$$P(A)=\frac{1}{2.741}=0.365,\quad P(B)=\frac{1}{2.741}=0.365,\quad P(C)=\frac{0.741}{2.741}=0.270$$The three probabilities sum to 1, and the model would predict a tie between A and B as the most likely purchase. Interpreting the coefficients: because A is the reference, $\beta_{C2}=0.50$ means that each extra unit of income multiplies the odds of choosing C over A by $e^{0.50}\approx 1.65$. A positive coefficient always reads as “this feature pushes the customer toward this class relative to the reference class,” which is why choosing a sensible baseline makes the whole model easier to explain.
When to use multinomial logistic regression
Reach for multinomial logistic regression when both of these are true:
- The target has three or more categories. With exactly two, use binary logistic regression — the softmax with $K=2$ reduces to the ordinary sigmoid anyway.
- The categories are unordered (nominal). Brands, species, transport modes, and product choices have no inherent ranking, so each class deserves its own coefficients.
How it is trained
Like binary logistic regression, the multinomial model has no closed-form solution. It is fit by maximum likelihood, minimizing the multiclass cross-entropy (log) loss with gradient descent or a solver like L-BFGS. The same optimizer underlies the softmax output layer of modern neural networks, which is exactly why multinomial logistic regression is often described as a single-layer neural net with a softmax activation. You can build intuition for the binary backbone first with our logistic regression calculator.
🤖 ML context
Multinomial logistic regression is the bridge from classic statistics to deep learning: its softmax output is the final layer of nearly every neural-network classifier, from image recognition to language models. Master the logistic regression hub and the leap from binary logistic regression to multiclass becomes one short step — you are simply stacking one linear model per class and normalizing with softmax.
Frequently asked questions
What is multinomial logistic regression?
How is multinomial logistic regression different from binary logistic regression?
What is the softmax function in multinomial logistic regression?
What is the reference category in multinomial logistic regression?
When should I use multinomial instead of ordinal logistic regression?
Key takeaways
Multinomial logistic regression generalizes binary logistic regression to three or more unordered classes by giving each class its own linear score and normalizing with softmax, $P(y=c\mid x)=e^{z_c}/\sum_j e^{z_j}$. Coefficients are read relative to a chosen reference class, and you can fit it jointly or approximate it with one-vs-rest. Continue with the logistic regression calculator, the two-class case in binary logistic regression, the ranked-class case in ordinal logistic regression, or the formal reference on Wikipedia.