Multivariate logistic regression is logistic regression with several predictors feeding a single yes/no outcome, and it is the everyday workhorse for questions like “given a borrower’s income, age, and credit score, how likely are they to default?” This guide untangles the confusing terminology, shows the formula, and walks through interpreting the coefficients when more than one variable is in play.

What is multivariate logistic regression?
At its core, multivariate logistic regression extends simple (one-predictor) logistic regression by adding more input variables. Instead of a single feature deciding the probability of an event, several features combine inside the same sigmoid. The model is:
$$p = \frac{1}{1+e^{-(\beta_0+\beta_1 x_1+\beta_2 x_2+\cdots+\beta_k x_k)}}$$Here $p$ is the probability of the positive class (default = yes, disease = present, churn = will churn), $x_1$ through $x_k$ are the $k$ predictors, $\beta_0$ is the intercept, and each $\beta_j$ is the weight on predictor $x_j$. The linear part $z = \beta_0+\beta_1 x_1+\cdots+\beta_k x_k$ is exactly the same kind of expression you would write for multiple binary logistic regression; the sigmoid then squeezes it into the range $(0,1)$ so the output is always a valid probability.
The terminology mess: multivariate vs multivariable vs multiple
Search results for this topic are a minefield because three similar words get used for the same model — and one of them is, strictly speaking, “wrong.” Here is what each term actually means so you can read papers without getting lost:
| Term you see | Strict statistical meaning | How people actually use it |
|---|---|---|
| Multiple logistic regression | Many predictors, ONE outcome | Correct and unambiguous — the safest term |
| Multivariable logistic regression | Many predictors, ONE outcome | Also correct; preferred in modern medical guidelines |
| Multivariate logistic regression | Technically: many OUTCOMES (several dependent variables) modeled jointly | In practice (especially epidemiology and medicine) it means many predictors, one outcome — the same as the two above |
So when a journal article reports a “multivariate logistic regression,” it almost always means a single binary outcome predicted from several variables. The truly multivariate case — two or more dependent variables modeled at once — is rare and usually labeled explicitly (for example, multinomial or multivariate probit models). For everyday machine learning, treat “multivariate,” “multivariable,” and “multiple” logistic regression as synonyms for “logistic regression with multiple predictors.”
Interpreting coefficients with several predictors
This is the reason researchers reach for the model in the first place. With one predictor, $\beta_1$ told you the effect of that single variable. With multivariate logistic regression, each coefficient is interpreted conditionally:
- $\beta_j$ is the change in the log-odds of the outcome for a one-unit increase in $x_j$, holding all other predictors constant.
- $e^{\beta_j}$ is the adjusted odds ratio — how the odds multiply per unit of $x_j$, after adjusting for every other variable in the model.
The phrase “holding the others constant” is the whole point. It lets you isolate the effect of one variable while adjusting for confounders. If older borrowers also tend to have higher incomes, a model with both variables can separate the independent contribution of age from the independent contribution of income — something a one-variable model cannot do. That is why you will hear the output described as an adjusted odds ratio rather than a raw odds ratio.
A worked example: predicting loan default
Suppose a lender fits a multivariate logistic regression to predict whether a borrower defaults (1) or repays (0) using three predictors: annual income in tens of thousands ($x_1$), age in years ($x_2$), and credit score in hundreds of points ($x_3$). The fitted model is:
$$z = 2.0 – 0.30\,x_1 – 0.04\,x_2 – 1.10\,x_3$$Take a borrower with income $x_1 = 6$ (i.e. \$60,000), age $x_2 = 40$, and credit score $x_3 = 7$ (i.e. 700). Plug in:
$$z = 2.0 – 0.30(6) – 0.04(40) – 1.10(7) = 2.0 – 1.8 – 1.6 – 7.7 = -9.1$$ $$p = \frac{1}{1+e^{-(-9.1)}} = \frac{1}{1+e^{9.1}} \approx 0.0001$$So this borrower has about a 0.01% predicted probability of default — reassuringly low. Now read the coefficients as adjusted odds ratios:
| Predictor | Coefficient $\beta_j$ | Adjusted odds ratio $e^{\beta_j}$ | Meaning (others held constant) |
|---|---|---|---|
| Income (+\$10k) | $-0.30$ | $0.74$ | Each extra \$10k cuts the odds of default by about 26% |
| Age (+1 year) | $-0.04$ | $0.96$ | Each extra year lowers default odds by about 4% |
| Credit score (+100 pts) | $-1.10$ | $0.33$ | Each extra 100 points cuts default odds by about 67% |
Notice that credit score has by far the strongest effect, and that every odds ratio is reported after adjusting for the other two variables. That adjustment is exactly what separates multivariate logistic regression from running three separate single-predictor models.
When to use multivariate logistic regression
Reach for it whenever your outcome is binary and you believe more than one factor drives it — which is almost always true in the real world. Typical triggers:
- You need to adjust for confounders. Medicine and epidemiology use it to estimate the effect of a treatment while controlling for age, sex, and comorbidities.
- You want a probability, not just a class. Credit scoring, churn prediction, and risk models all need calibrated probabilities from many inputs.
- You want interpretable weights. Unlike many black-box models, each coefficient maps directly to an adjusted odds ratio you can explain to a stakeholder.
Assumptions to keep in mind
Adding predictors brings one assumption to the front: no severe multicollinearity. If two predictors are highly correlated (say income and credit score move together), the model struggles to assign credit between them, the coefficients become unstable, and their standard errors balloon. Check this with a correlation matrix or variance inflation factors (VIF) before trusting individual coefficients. The other standard assumptions still apply: a binary outcome, independent observations, a linear relationship between each predictor and the log-odds, and a reasonably large sample (a common rule of thumb is at least 10 outcome events per predictor).
🤖 ML context
Multivariate logistic regression is the simplest multi-feature classifier in supervised learning, and it is mathematically a single neuron with a sigmoid activation over many inputs — the building block of every neural network. The weighted-sum-then-sigmoid pattern you see here scales straight up to deep models. Build intuition with the logistic regression calculator, then compare with binary logistic regression.
Frequently asked questions
What is multivariate logistic regression?
What is the difference between multivariate and multivariable logistic regression?
How do you interpret coefficients in multivariate logistic regression?
What is an adjusted odds ratio?
Why does multicollinearity matter in multivariate logistic regression?
Key takeaways
Multivariate logistic regression is simply logistic regression with multiple predictors driving one binary outcome through the sigmoid. The terminology is messy — “multivariate,” “multivariable,” and “multiple” are used interchangeably even though “multivariate” technically means many outcomes — but the model itself is straightforward. Each coefficient becomes an adjusted odds ratio, isolating one variable’s effect while controlling for the rest, which is exactly why researchers use it to adjust for confounders. Continue with the logistic regression calculator, the guide to odds ratio interpretation, or the formal reference on Wikipedia.