Binary logistic regression is the standard machine learning model for predicting a two-class outcome — yes/no, pass/fail, spam/not spam — by estimating the probability that an example belongs to the positive class and then applying a threshold to make the call. It is the first classifier most beginners learn, and for good reason: it is simple, interpretable, and the foundation for almost everything that follows.

What is binary logistic regression?
Binary logistic regression is the logistic model applied to a target variable with exactly two possible outcomes, conventionally labeled $1$ (the positive class) and $0$ (the negative class). Instead of predicting a raw number like ordinary linear regression, it predicts $p$, the probability that the outcome is the positive class. To keep that probability inside a valid range, the model passes a linear combination of the inputs through the sigmoid (logistic) function:
$$p = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}}$$The sigmoid is an S-shaped curve whose output always lands strictly between 0 and 1, so it can be read directly as a probability. Here $\beta_0$ is the intercept and $\beta_1$ is the coefficient on the input $x$. Add more features and the exponent simply becomes $\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_k x_k$, but the shape and idea stay the same.
From probability to class: the 0.5 threshold
The sigmoid gives you a probability, but a classifier ultimately has to output a label. That final step is the threshold. By default you predict class 1 when $p \ge 0.5$ and class 0 otherwise. Because the sigmoid equals exactly $0.5$ when its exponent is zero, the rule $p \ge 0.5$ is the same as $\beta_0 + \beta_1 x \ge 0$. That line (or hyperplane in higher dimensions) is the decision boundary that separates the two predicted classes.
The log-odds (logit) form
The sigmoid equation has an equivalent, often more revealing form. If you take the odds $\frac{p}{1-p}$ and apply the natural log, the model becomes perfectly linear:
$$\ln\!\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x$$The left side is the log-odds, also called the logit. This is the heart of why the algorithm carries the word “regression”: it performs a straight-line regression on the log-odds of the event. A one-unit increase in $x$ adds $\beta_1$ to the log-odds, which means it multiplies the odds by $e^{\beta_1}$ — the odds ratio. That single number is what makes binary logistic regression so interpretable in fields like medicine and credit scoring.
How binary logistic regression is fit
Unlike linear regression, there is no tidy closed-form solution for the coefficients. Instead, binary logistic regression is fit by maximum likelihood estimation (MLE): the optimizer searches for the $\beta$ values that make the observed 0/1 labels as probable as possible under the model. Equivalently, it minimizes the log loss (binary cross-entropy):
$$\mathcal{L} = -\frac{1}{n}\sum_{i=1}^{n}\Big[y_i \ln p_i + (1-y_i)\ln(1-p_i)\Big]$$Because this cost has no algebraic shortcut, it is solved iteratively with gradient descent (or its cousins) — the very same optimizer that trains deep neural networks. The loss is convex, so gradient descent reliably converges to the single best set of coefficients.
- Start with initial guesses for $\beta_0$ and $\beta_1$ (often zeros).
- Predict a probability $p_i$ for every training example with the sigmoid.
- Measure the log loss between those probabilities and the true 0/1 labels.
- Update the coefficients in the direction that lowers the loss.
- Repeat until the loss stops improving — the coefficients have converged.
Worked example: predicting exam pass/fail
Suppose you collect data on students and fit a binary logistic regression to predict whether each one passes ($y=1$) or fails ($y=0$) an exam from a single input, $x =$ hours studied. The fitting routine returns coefficients $\beta_0 = -4.08$ and $\beta_1 = 1.50$. The model is:
$$p = \frac{1}{1 + e^{-(-4.08 + 1.50\,x)}}$$Now predict for a student who studied 3 hours. First compute the linear score (the log-odds):
$$z = -4.08 + 1.50 \times 3 = -4.08 + 4.50 = 0.42$$Then squash it through the sigmoid:
$$p = \frac{1}{1 + e^{-0.42}} \approx \frac{1}{1 + 0.657} \approx 0.61$$The model estimates a 61% chance of passing. Since $0.61 \ge 0.5$, the prediction is class 1 — pass. The table below shows how the probability climbs with study time and crosses the 0.5 threshold a little before 3 hours.
| Hours studied ($x$) | Log-odds $z=-4.08+1.50x$ | Probability $p$ | Predicted class |
|---|---|---|---|
| 1 | $-2.58$ | $\approx 0.07$ | Fail (0) |
| 2 | $-1.08$ | $\approx 0.25$ | Fail (0) |
| 2.72 | $\approx 0.00$ | $\approx 0.50$ | Boundary |
| 3 | $0.42$ | $\approx 0.61$ | Pass (1) |
| 4 | $1.92$ | $\approx 0.87$ | Pass (1) |
| 5 | $3.42$ | $\approx 0.97$ | Pass (1) |
The decision boundary sits where $z = 0$, i.e. at $x = 4.08 / 1.50 \approx 2.72$ hours. Below it the model predicts fail; above it, pass. Try the same math on your own numbers with our logistic regression calculator.
Binary vs multinomial vs ordinal logistic regression
“Binary” describes the number of outcome classes. When you have more than two, you reach for a sibling model. The choice depends on how many categories there are and whether they have a natural order:
| Type | Number of classes | Order matters? | Example outcome |
|---|---|---|---|
| Binary | Exactly 2 | No | Pass / fail, spam / not spam |
| Multinomial | 3 or more | No (unordered) | Cat / dog / bird |
| Ordinal | 3 or more | Yes (ordered) | Low / medium / high rating |
If your target has three or more unordered categories, see multinomial logistic regression. Ordinal logistic regression handles ranked categories where the spacing between levels is not assumed equal. For the broader comparison with continuous prediction, read logistic regression vs linear regression.
Where binary logistic regression is used
Because so many real decisions are genuinely two-sided, binary logistic regression shows up everywhere:
- Spam detection — is this email spam or not?
- Customer churn — will this subscriber cancel next month, yes or no?
- Medical screening — does this patient have the disease, positive or negative?
- Credit risk — will this applicant default on the loan?
- Conversion prediction — will this visitor click or buy?
🤖 ML context
Binary logistic regression is the simplest neural network: a single neuron with a sigmoid activation, trained by gradient descent on log loss. Master it and you understand the building block behind deep classifiers. It sits squarely in the supervised learning family. Build intuition with the logistic regression calculator, then extend to many classes with multinomial logistic regression.
Frequently asked questions
What is binary logistic regression?
What is the difference between binary and multinomial logistic regression?
How is binary logistic regression fit to data?
How does the model turn a probability into a class?
What does the output of binary logistic regression mean?
Key takeaways
Binary logistic regression models the probability of a two-class outcome with the sigmoid, fits its coefficients by maximum likelihood, and converts that probability into a label at the 0.5 threshold. The log-odds form makes it linear and interpretable, the decision boundary sits where the score is zero, and the same idea scales to many classes through multinomial and ordinal variants. Continue with the logistic regression calculator, compare it to a line in logistic regression vs linear regression, or read the formal reference on Wikipedia.