Binary Logistic Regression

Binary logistic regression is the standard machine learning model for predicting a two-class outcome — yes/no, pass/fail, spam/not spam — by estimating the probability that an example belongs to the positive class and then applying a threshold to make the call. It is the first classifier most beginners learn, and for good reason: it is simple, interpretable, and the foundation for almost everything that follows.

binary logistic regression predicting a two-class outcome
Binary logistic regression maps inputs to the probability of one of two classes.

What is binary logistic regression?

Binary logistic regression is the logistic model applied to a target variable with exactly two possible outcomes, conventionally labeled $1$ (the positive class) and $0$ (the negative class). Instead of predicting a raw number like ordinary linear regression, it predicts $p$, the probability that the outcome is the positive class. To keep that probability inside a valid range, the model passes a linear combination of the inputs through the sigmoid (logistic) function:

$$p = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}}$$

The sigmoid is an S-shaped curve whose output always lands strictly between 0 and 1, so it can be read directly as a probability. Here $\beta_0$ is the intercept and $\beta_1$ is the coefficient on the input $x$. Add more features and the exponent simply becomes $\beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_k x_k$, but the shape and idea stay the same.

Binary logistic regression, defined A supervised classification model that estimates the probability of a two-class outcome by squashing a linear score $\beta_0 + \beta_1 x$ through the sigmoid, then assigns the class by comparing that probability to a threshold (usually $0.5$). The output is in the open interval $(0,1)$.

From probability to class: the 0.5 threshold

The sigmoid gives you a probability, but a classifier ultimately has to output a label. That final step is the threshold. By default you predict class 1 when $p \ge 0.5$ and class 0 otherwise. Because the sigmoid equals exactly $0.5$ when its exponent is zero, the rule $p \ge 0.5$ is the same as $\beta_0 + \beta_1 x \ge 0$. That line (or hyperplane in higher dimensions) is the decision boundary that separates the two predicted classes.

📊 The threshold is a choice0.5 is the default, but you can move it. If false negatives are costly (say, missing a disease), lower the threshold so you flag the positive class more eagerly. If false positives are costly (blocking real email as spam), raise it. The model’s probabilities stay the same — only the cutoff changes.

The log-odds (logit) form

The sigmoid equation has an equivalent, often more revealing form. If you take the odds $\frac{p}{1-p}$ and apply the natural log, the model becomes perfectly linear:

$$\ln\!\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x$$

The left side is the log-odds, also called the logit. This is the heart of why the algorithm carries the word “regression”: it performs a straight-line regression on the log-odds of the event. A one-unit increase in $x$ adds $\beta_1$ to the log-odds, which means it multiplies the odds by $e^{\beta_1}$ — the odds ratio. That single number is what makes binary logistic regression so interpretable in fields like medicine and credit scoring.

How binary logistic regression is fit

Unlike linear regression, there is no tidy closed-form solution for the coefficients. Instead, binary logistic regression is fit by maximum likelihood estimation (MLE): the optimizer searches for the $\beta$ values that make the observed 0/1 labels as probable as possible under the model. Equivalently, it minimizes the log loss (binary cross-entropy):

$$\mathcal{L} = -\frac{1}{n}\sum_{i=1}^{n}\Big[y_i \ln p_i + (1-y_i)\ln(1-p_i)\Big]$$

Because this cost has no algebraic shortcut, it is solved iteratively with gradient descent (or its cousins) — the very same optimizer that trains deep neural networks. The loss is convex, so gradient descent reliably converges to the single best set of coefficients.

  1. Start with initial guesses for $\beta_0$ and $\beta_1$ (often zeros).
  2. Predict a probability $p_i$ for every training example with the sigmoid.
  3. Measure the log loss between those probabilities and the true 0/1 labels.
  4. Update the coefficients in the direction that lowers the loss.
  5. Repeat until the loss stops improving — the coefficients have converged.

Worked example: predicting exam pass/fail

Suppose you collect data on students and fit a binary logistic regression to predict whether each one passes ($y=1$) or fails ($y=0$) an exam from a single input, $x =$ hours studied. The fitting routine returns coefficients $\beta_0 = -4.08$ and $\beta_1 = 1.50$. The model is:

$$p = \frac{1}{1 + e^{-(-4.08 + 1.50\,x)}}$$

Now predict for a student who studied 3 hours. First compute the linear score (the log-odds):

$$z = -4.08 + 1.50 \times 3 = -4.08 + 4.50 = 0.42$$

Then squash it through the sigmoid:

$$p = \frac{1}{1 + e^{-0.42}} \approx \frac{1}{1 + 0.657} \approx 0.61$$

The model estimates a 61% chance of passing. Since $0.61 \ge 0.5$, the prediction is class 1 — pass. The table below shows how the probability climbs with study time and crosses the 0.5 threshold a little before 3 hours.

Hours studied ($x$)Log-odds $z=-4.08+1.50x$Probability $p$Predicted class
1$-2.58$$\approx 0.07$Fail (0)
2$-1.08$$\approx 0.25$Fail (0)
2.72$\approx 0.00$$\approx 0.50$Boundary
3$0.42$$\approx 0.61$Pass (1)
4$1.92$$\approx 0.87$Pass (1)
5$3.42$$\approx 0.97$Pass (1)

The decision boundary sits where $z = 0$, i.e. at $x = 4.08 / 1.50 \approx 2.72$ hours. Below it the model predicts fail; above it, pass. Try the same math on your own numbers with our logistic regression calculator.

Binary vs multinomial vs ordinal logistic regression

“Binary” describes the number of outcome classes. When you have more than two, you reach for a sibling model. The choice depends on how many categories there are and whether they have a natural order:

TypeNumber of classesOrder matters?Example outcome
BinaryExactly 2NoPass / fail, spam / not spam
Multinomial3 or moreNo (unordered)Cat / dog / bird
Ordinal3 or moreYes (ordered)Low / medium / high rating

If your target has three or more unordered categories, see multinomial logistic regression. Ordinal logistic regression handles ranked categories where the spacing between levels is not assumed equal. For the broader comparison with continuous prediction, read logistic regression vs linear regression.

Where binary logistic regression is used

Because so many real decisions are genuinely two-sided, binary logistic regression shows up everywhere:

  • Spam detection — is this email spam or not?
  • Customer churn — will this subscriber cancel next month, yes or no?
  • Medical screening — does this patient have the disease, positive or negative?
  • Credit risk — will this applicant default on the loan?
  • Conversion prediction — will this visitor click or buy?
✅ Evaluating a binary classifierThe simplest metric is accuracy — the share of predictions that are correct at the 0.5 threshold. But when classes are imbalanced, also look at precision, recall, and the ROC-AUC, which judges the ranking of probabilities independent of any single cutoff. The decision boundary at $p = 0.5$ is just the starting point.

🤖 ML context

Binary logistic regression is the simplest neural network: a single neuron with a sigmoid activation, trained by gradient descent on log loss. Master it and you understand the building block behind deep classifiers. It sits squarely in the supervised learning family. Build intuition with the logistic regression calculator, then extend to many classes with multinomial logistic regression.

Frequently asked questions

What is binary logistic regression?
Binary logistic regression is a supervised classification model that predicts the probability of a two-class outcome (such as yes/no or pass/fail) by passing a linear combination of inputs through the sigmoid function, then assigning the class with a threshold, usually 0.5.
What is the difference between binary and multinomial logistic regression?
Binary logistic regression handles exactly two outcome classes. Multinomial logistic regression handles three or more unordered classes, and ordinal logistic regression handles three or more ordered classes.
How is binary logistic regression fit to data?
It is fit by maximum likelihood estimation, which is equivalent to minimizing the log loss (binary cross-entropy). Because there is no closed-form solution, the coefficients are found iteratively with gradient descent.
How does the model turn a probability into a class?
It applies a threshold. By default it predicts the positive class when the probability is at least 0.5 and the negative class otherwise. The threshold can be raised or lowered to trade off false positives against false negatives.
What does the output of binary logistic regression mean?
The output is a probability between 0 and 1 that the example belongs to the positive class. A value of 0.61 means a 61 percent estimated chance of the positive outcome, which at a 0.5 threshold is classified as positive.

Key takeaways

Binary logistic regression models the probability of a two-class outcome with the sigmoid, fits its coefficients by maximum likelihood, and converts that probability into a label at the 0.5 threshold. The log-odds form makes it linear and interpretable, the decision boundary sits where the score is zero, and the same idea scales to many classes through multinomial and ordinal variants. Continue with the logistic regression calculator, compare it to a line in logistic regression vs linear regression, or read the formal reference on Wikipedia.

Scroll to Top