Multivariate Logistic Regression

Multivariate logistic regression is logistic regression with several predictors feeding a single yes/no outcome, and it is the everyday workhorse for questions like “given a borrower’s income, age, and credit score, how likely are they to default?” This guide untangles the confusing terminology, shows the formula, and walks through interpreting the coefficients when more than one variable is in play.

multivariate logistic regression with several predictors
Multiple predictors feed one binary outcome through the sigmoid.

What is multivariate logistic regression?

At its core, multivariate logistic regression extends simple (one-predictor) logistic regression by adding more input variables. Instead of a single feature deciding the probability of an event, several features combine inside the same sigmoid. The model is:

$$p = \frac{1}{1+e^{-(\beta_0+\beta_1 x_1+\beta_2 x_2+\cdots+\beta_k x_k)}}$$

Here $p$ is the probability of the positive class (default = yes, disease = present, churn = will churn), $x_1$ through $x_k$ are the $k$ predictors, $\beta_0$ is the intercept, and each $\beta_j$ is the weight on predictor $x_j$. The linear part $z = \beta_0+\beta_1 x_1+\cdots+\beta_k x_k$ is exactly the same kind of expression you would write for multiple binary logistic regression; the sigmoid then squeezes it into the range $(0,1)$ so the output is always a valid probability.

In one line Multivariate logistic regression = the sigmoid function applied to a weighted sum of many predictors, producing the probability of a single binary outcome. Add predictors, the algebra barely changes; the interpretation is where the interesting work happens.

The terminology mess: multivariate vs multivariable vs multiple

Search results for this topic are a minefield because three similar words get used for the same model — and one of them is, strictly speaking, “wrong.” Here is what each term actually means so you can read papers without getting lost:

Term you seeStrict statistical meaningHow people actually use it
Multiple logistic regressionMany predictors, ONE outcomeCorrect and unambiguous — the safest term
Multivariable logistic regressionMany predictors, ONE outcomeAlso correct; preferred in modern medical guidelines
Multivariate logistic regressionTechnically: many OUTCOMES (several dependent variables) modeled jointlyIn practice (especially epidemiology and medicine) it means many predictors, one outcome — the same as the two above

So when a journal article reports a “multivariate logistic regression,” it almost always means a single binary outcome predicted from several variables. The truly multivariate case — two or more dependent variables modeled at once — is rare and usually labeled explicitly (for example, multinomial or multivariate probit models). For everyday machine learning, treat “multivariate,” “multivariable,” and “multiple” logistic regression as synonyms for “logistic regression with multiple predictors.”

📊 The quick ruleCount the outcomes, not the predictors. One binary outcome plus many predictors is what the vast majority of “multivariate logistic regression” articles describe, even though the prefix “multi-variate” literally points to many variates (outcomes).

Interpreting coefficients with several predictors

This is the reason researchers reach for the model in the first place. With one predictor, $\beta_1$ told you the effect of that single variable. With multivariate logistic regression, each coefficient is interpreted conditionally:

  • $\beta_j$ is the change in the log-odds of the outcome for a one-unit increase in $x_j$, holding all other predictors constant.
  • $e^{\beta_j}$ is the adjusted odds ratio — how the odds multiply per unit of $x_j$, after adjusting for every other variable in the model.

The phrase “holding the others constant” is the whole point. It lets you isolate the effect of one variable while adjusting for confounders. If older borrowers also tend to have higher incomes, a model with both variables can separate the independent contribution of age from the independent contribution of income — something a one-variable model cannot do. That is why you will hear the output described as an adjusted odds ratio rather than a raw odds ratio.

💡 Sign readingA positive $\beta_j$ means the predictor raises the probability of the outcome (odds ratio above 1); a negative $\beta_j$ lowers it (odds ratio below 1); $\beta_j = 0$ means no effect (odds ratio of exactly 1) once the other predictors are accounted for.

A worked example: predicting loan default

Suppose a lender fits a multivariate logistic regression to predict whether a borrower defaults (1) or repays (0) using three predictors: annual income in tens of thousands ($x_1$), age in years ($x_2$), and credit score in hundreds of points ($x_3$). The fitted model is:

$$z = 2.0 – 0.30\,x_1 – 0.04\,x_2 – 1.10\,x_3$$

Take a borrower with income $x_1 = 6$ (i.e. \$60,000), age $x_2 = 40$, and credit score $x_3 = 7$ (i.e. 700). Plug in:

$$z = 2.0 – 0.30(6) – 0.04(40) – 1.10(7) = 2.0 – 1.8 – 1.6 – 7.7 = -9.1$$ $$p = \frac{1}{1+e^{-(-9.1)}} = \frac{1}{1+e^{9.1}} \approx 0.0001$$

So this borrower has about a 0.01% predicted probability of default — reassuringly low. Now read the coefficients as adjusted odds ratios:

PredictorCoefficient $\beta_j$Adjusted odds ratio $e^{\beta_j}$Meaning (others held constant)
Income (+\$10k)$-0.30$$0.74$Each extra \$10k cuts the odds of default by about 26%
Age (+1 year)$-0.04$$0.96$Each extra year lowers default odds by about 4%
Credit score (+100 pts)$-1.10$$0.33$Each extra 100 points cuts default odds by about 67%

Notice that credit score has by far the strongest effect, and that every odds ratio is reported after adjusting for the other two variables. That adjustment is exactly what separates multivariate logistic regression from running three separate single-predictor models.

When to use multivariate logistic regression

Reach for it whenever your outcome is binary and you believe more than one factor drives it — which is almost always true in the real world. Typical triggers:

  1. You need to adjust for confounders. Medicine and epidemiology use it to estimate the effect of a treatment while controlling for age, sex, and comorbidities.
  2. You want a probability, not just a class. Credit scoring, churn prediction, and risk models all need calibrated probabilities from many inputs.
  3. You want interpretable weights. Unlike many black-box models, each coefficient maps directly to an adjusted odds ratio you can explain to a stakeholder.

Assumptions to keep in mind

Adding predictors brings one assumption to the front: no severe multicollinearity. If two predictors are highly correlated (say income and credit score move together), the model struggles to assign credit between them, the coefficients become unstable, and their standard errors balloon. Check this with a correlation matrix or variance inflation factors (VIF) before trusting individual coefficients. The other standard assumptions still apply: a binary outcome, independent observations, a linear relationship between each predictor and the log-odds, and a reasonably large sample (a common rule of thumb is at least 10 outcome events per predictor).

⚠ Multicollinearity warningHigh correlation among predictors does not hurt overall prediction much, but it wrecks the interpretation of individual coefficients — the very thing multivariate logistic regression is prized for. If two variables overlap heavily, consider dropping one or combining them.

🤖 ML context

Multivariate logistic regression is the simplest multi-feature classifier in supervised learning, and it is mathematically a single neuron with a sigmoid activation over many inputs — the building block of every neural network. The weighted-sum-then-sigmoid pattern you see here scales straight up to deep models. Build intuition with the logistic regression calculator, then compare with binary logistic regression.

Frequently asked questions

What is multivariate logistic regression?
It is logistic regression with several predictors feeding a single binary outcome. A weighted sum of the predictors is passed through the sigmoid function to produce the probability of the positive class.
What is the difference between multivariate and multivariable logistic regression?
Strictly, multivariable means many predictors and one outcome, while multivariate means many outcomes. In practice, especially in medicine and epidemiology, people use multivariate to mean many predictors and one outcome, so the two are usually treated as the same model.
How do you interpret coefficients in multivariate logistic regression?
Each coefficient is the change in the log-odds of the outcome per one-unit increase in that predictor, holding all the other predictors constant. Exponentiating the coefficient gives the adjusted odds ratio for that variable.
What is an adjusted odds ratio?
It is the odds ratio for one predictor after accounting for every other predictor in the model. It tells you the independent effect of that variable, which is why it is used to adjust for confounders.
Why does multicollinearity matter in multivariate logistic regression?
When predictors are highly correlated, the model cannot cleanly separate their individual effects. Coefficients become unstable and standard errors grow, which undermines the interpretation of each adjusted odds ratio. Check it with a correlation matrix or variance inflation factors.

Key takeaways

Multivariate logistic regression is simply logistic regression with multiple predictors driving one binary outcome through the sigmoid. The terminology is messy — “multivariate,” “multivariable,” and “multiple” are used interchangeably even though “multivariate” technically means many outcomes — but the model itself is straightforward. Each coefficient becomes an adjusted odds ratio, isolating one variable’s effect while controlling for the rest, which is exactly why researchers use it to adjust for confounders. Continue with the logistic regression calculator, the guide to odds ratio interpretation, or the formal reference on Wikipedia.

Scroll to Top