Ordinal logistic regression is the model you reach for when your outcome has three or more ordered categories — low/medium/high, disagree/neutral/agree, or a survey rating from 1 to 5 — and you want to use that ordering instead of throwing it away. This guide explains the proportional-odds model, the cumulative-logit math, a worked example, and exactly when to pick it over its cousins.

What is ordinal logistic regression?
Ordinal logistic regression — also called the ordered logit or proportional odds model — predicts an outcome that falls into one of several ordered categories. Ordinary binary logistic regression handles a two-way yes/no split. Ordinal logistic regression generalizes it to $J$ ordered levels, such as a satisfaction rating of low, medium, or high, where high sits above medium which sits above low. The order is real information, and the model is built to respect it.
The trick is to avoid modeling each category in isolation. Instead, ordinal logistic regression cuts the ordered scale at every boundary between adjacent categories and models the cumulative probability of being at or below each cut. One set of slopes is shared across all those cuts, which is what makes the model compact and interpretable. You can experiment with the binary building block in our logistic regression calculator before stacking up the ordered version.
The proportional-odds (cumulative-logit) model
Let the outcome $Y$ take ordered values $1, 2, \dots, J$. For each cut point $j$ (from $1$ up to $J-1$), the model writes the log-odds of being in category $j$ or lower as:
$$\ln\!\left(\frac{P(Y \le j)}{P(Y > j)}\right) = \alpha_j – \beta_1 x_1 – \beta_2 x_2 – \cdots – \beta_k x_k$$Read the pieces carefully. The left side is a cumulative logit — the log-odds of landing at or below category $j$. Each $\alpha_j$ is a threshold (a category-specific intercept), and because the categories are ordered, the thresholds are too: $\alpha_1 < \alpha_2 < \cdots < \alpha_{J-1}$. The slopes $\beta_1, \dots, \beta_k$ are shared across every threshold — there is only one $\beta$ per predictor, not a different one at each cut. That single shared-slope rule is the famous proportional-odds assumption.
The minus sign in front of the $\beta x$ terms is a convention that makes interpretation pleasant: a positive $\beta$ means larger $x$ pushes the outcome toward higher categories. With the cumulative form in hand, the probability of any single category is just a difference of two adjacent cumulative probabilities, $P(Y = j) = P(Y \le j) – P(Y \le j-1)$.
Checking the proportional-odds assumption
Because the model leans on one shared slope per predictor, you should sanity-check that the assumption is reasonable rather than just assume it. Three practical checks:
- Fit separate binary models at each cut. Collapse the outcome into “at or below $j$” versus “above $j$” for each threshold and fit a plain logistic regression. If the slope for a predictor is roughly stable across those cuts, proportional odds is plausible.
- Run a formal test. The Brant test (or a likelihood-ratio test against a model with cut-specific slopes) flags predictors whose effect drifts across thresholds. A small p-value warns the assumption is shaky.
- Plot the cumulative logits. If the gaps between cumulative-logit lines stay parallel across levels of a predictor, the proportional-odds picture holds; clearly crossing lines suggest it does not.
Worked example: predicting satisfaction
Suppose we predict a customer satisfaction rating with three ordered levels — low (1), medium (2), high (3) — from a single predictor $x$, the number of support interactions resolved on first contact. Fitting an ordinal logistic regression returns two thresholds and one slope:
$$\alpha_1 = -1.2, \qquad \alpha_2 = 1.0, \qquad \beta_1 = 0.8$$The shared slope $\beta_1 = 0.8$ gives an odds ratio of $e^{0.8} \approx 2.23$. Interpret it as the effect on the odds of landing in a higher satisfaction category: each additional first-contact resolution multiplies the odds of being in a higher rating bucket by about $2.23$, and this holds at both the low-vs-(medium+high) cut and the (low+medium)-vs-high cut.
To get an actual probability, plug a value of $x$ into the cumulative logits. For $x = 2$:
$$P(Y \le 1) = \frac{1}{1 + e^{-(\alpha_1 – \beta_1 x)}} = \frac{1}{1 + e^{-(-1.2 – 1.6)}} \approx 0.057$$ $$P(Y \le 2) = \frac{1}{1 + e^{-(\alpha_2 – \beta_1 x)}} = \frac{1}{1 + e^{-(1.0 – 1.6)}} \approx 0.354$$From these cumulatives, the per-category probabilities follow by subtraction: $P(Y=1) \approx 0.057$, $P(Y=2) = 0.354 – 0.057 \approx 0.297$, and $P(Y=3) = 1 – 0.354 \approx 0.646$. So a customer with two first-contact resolutions is most likely to be highly satisfied — exactly the kind of ordered, probabilistic answer ordinal logistic regression is built to give.
Ordinal vs multinomial vs binary logistic regression
All three are logistic models; they differ in how many classes the outcome has and whether those classes are ordered. The table makes the choice concrete:
| Model | Outcome ordered? | Number of classes | What it estimates |
|---|---|---|---|
| Binary logistic | N/A (just two) | 2 | One intercept and one slope set; a single logit |
| Ordinal logistic | Yes, ordered | 3 or more | $J-1$ thresholds, one shared slope set (proportional odds) |
| Multinomial logistic | No, unordered | 3 or more | Separate slope set for each class vs a reference |
The key contrast is the slope count. Ordinal logistic regression spends one slope per predictor and many thresholds; multinomial logistic regression spends a full slope set per class. Ordinal is leaner precisely because it borrows strength from the ordering.
When to use ordinal vs multinomial
The decision rule is short: use ordinal logistic regression when the order of the categories carries information, and use multinomial when it does not.
- Are the categories genuinely ordered? Ratings, agreement scales, severity levels (mild/moderate/severe) → ordinal logistic regression.
- Are they just labels with no natural ranking? Predicting which of red/green/blue, or which product category → multinomial logistic regression.
- Are there exactly two outcomes? → plain binary logistic regression.
Ignoring order has a real cost. If you feed an ordered outcome into a multinomial model, you throw away the ranking and lose statistical power — you estimate far more parameters than necessary and your coefficients become harder to interpret. When order is meaningful, the proportional-odds model gives you a tighter, more interpretable fit with a single odds ratio per predictor.
🤖 ML context
Ordinal logistic regression is a generalized linear model in the same family as binary and multinomial logistic regression — same linear backbone, different link. It shows up across supervised learning wherever targets are ordered ratings: review scores, credit grades, disease staging. Master the binary case first with the logistic regression calculator, then review the shared logistic regression assumptions before trusting an ordinal fit.
Frequently asked questions
What is ordinal logistic regression?
What is the proportional-odds assumption?
How is ordinal different from multinomial logistic regression?
How do I interpret the odds ratio in ordinal logistic regression?
When should I use ordinal logistic regression?
Key takeaways
Ordinal logistic regression models an ordered outcome by stacking cumulative logits: $J-1$ thresholds capture where the category boundaries sit, while a single shared slope per predictor captures the effect, summarized by one odds ratio under the proportional-odds assumption. Check that assumption, and if it holds you get a compact, interpretable model that respects the ordering instead of discarding it. Continue with the logistic regression calculator, compare it against multinomial logistic regression, review the logistic regression assumptions, or read the formal reference on Wikipedia.