Limits and Continuity: The Essential Prerequisites for ML Calculus

Limits and continuity are the twin foundations upon which all of calculus is built — and by extension, the mathematical backbone of modern machine learning. If you have ever wondered how a neural network learns, how gradient descent converges, or why certain activation functions behave the way they do, the answers trace back to these two fundamental concepts.

This pillar guide walks you through everything you need to know about limits and continuity before you dive into derivatives, integrals, and the calculus that powers ML algorithms. Each section links to a dedicated in-depth spoke article so you can go deeper on any topic.


1. Why Limits and Continuity Matter in Machine Learning

Machine learning is applied mathematics at scale. At the heart of training any ML model lies an optimization process — finding the set of parameters that minimizes a loss function. That optimization process depends entirely on derivatives. And derivatives are defined using limits.

Consider gradient descent, the workhorse optimizer of deep learning. At each training step, the algorithm computes the gradient of the loss function with respect to the model’s weights. The gradient is a derivative. The derivative is a limit. Without a solid grasp of limits and continuity, the gradient descent algorithm is just a black box.

Continuity matters too. A loss function that is continuous and smooth allows gradient-based optimizers to work reliably. When a function has a discontinuity — a sudden jump or break — optimizers can get stuck, oscillate, or diverge. Understanding continuity helps you design better models and debug problematic training runs.

Beyond gradient descent, limits and continuity appear in:

  • Activation functions — sigmoid, tanh, and ReLU all exhibit interesting limit behavior at extreme input values.
  • Regularization — L1 and L2 norms rely on continuous penalty functions.
  • Convergence proofs — proving that a sequence of model parameters converges to an optimal solution requires limit theory.
  • Probability and statistics — the Central Limit Theorem, a cornerstone of statistical ML, is named after the concept of a limit.

This guide will give you a rigorous yet accessible tour of all these ideas, building your intuition step by step.

2. What Is a Limit? The Core Idea.

The concept of what is a limit in calculus is deceptively simple: a limit describes the value that a function approaches as its input gets arbitrarily close to some point.

Formally, we write:

lim[xa] f(x) = L

This reads: “The limit of f(x) as x approaches a equals L.” It means that as x gets closer and closer to a — from either side — f(x) gets closer and closer to L. Critically, x never actually reaches a; we only care about what the function approaches.

A simple example:

f(x) = (x² - 1) / (x - 1)

At x = 1, this function is undefined (division by zero). But factoring the numerator gives:

f(x) = (x + 1)(x - 1) / (x - 1) = x + 1   (for x1)

So as x → 1, f(x) → 2. The limit is 2, even though f(1) is undefined.

This is the power of limits: they let us reason about function behavior at points where the function might not even exist. In ML, this is analogous to understanding what happens at boundary conditions, at initialization, or at extreme parameter values.

👉 Deep dive: What is a Limit in Calculus? A Beginner’s Guide

3. Left-Hand and Right-Hand Limits.

Not all functions approach the same value from both sides. This is where left-hand vs right-hand limits become essential.

  • Left-hand limit: lim[x → a⁻] f(x) — x approaches a from values less than a.
  • Right-hand limit: lim[x → a⁺] f(x) — x approaches a from values greater than a.

For a two-sided limit to exist, both one-sided limits must exist and be equal:

lim[xa] f(x) = Llim[xa⁻] f(x) = L  AND  lim[xa⁺] f(x) = L

Why this matters for ML: The ReLU activation function is defined as f(x) = max(0, x). At x = 0, the left-hand limit is 0 and the right-hand limit is also 0, so the two-sided limit exists. However, the derivative does not — and understanding why requires one-sided limit analysis.

👉 Deep dive: Left-Hand vs Right-Hand Limits Explained

4. Key Properties of Limits.

Working with limits becomes practical once you know their core properties. These rules let you break complex expressions into simpler parts.

PropertyRule
Sum Rulelim[f + g] = lim f + lim g
Product Rulelim[f · g] = lim f · lim g
Quotient Rulelim[f/g] = lim f / lim g (if lim g ≠ 0)
Constant Multiplelim[c · f] = c · lim f
Power Rulelim[f(x)ⁿ] = (lim f(x))ⁿ

These properties allow you to evaluate limits algebraically without building a table of values. They are the engine behind every symbolic limit computation you will encounter in calculus — and in ML theory derivations.

Special techniques include factoring, rationalization, and L’Hôpital’s Rule for indeterminate forms like 0/0 and ∞/∞.

👉 Deep dive: Core Properties of Limits with Step-by-Step Examples

5. Limits at Infinity and Asymptotes.

Limits approaching infinity describe how a function behaves as its input grows without bound. This connects directly to the concept of asymptotes.

lim[x → ∞] f(x) = LHorizontal asymptote at y = L
lim[xa] f(x) = ∞   →   Vertical asymptote at x = a

Critical ML examples:

The sigmoid function, one of the most widely used ML activation functions, is:

σ(x) = 1 / (1 + e^(-x))
  • As x → +∞: σ(x) → 1
  • As x → -∞: σ(x) → 0

These horizontal asymptotes at y = 0 and y = 1 explain why sigmoid outputs are bounded between 0 and 1 — making it ideal for probabilistic classification. This behavior is entirely captured by limits at infinity.

Similarly, the tanh function has asymptotes at y = -1 and y = +1, which can be proven using limit analysis.

👉 Deep dive: Limits Approaching Infinity: Asymptotes in Math

6. The Squeeze Theorem {#squeeze-theorem}

The Squeeze Theorem (also called the Sandwich Theorem) is a powerful tool for finding limits when direct substitution fails. It states:

If g(x) ≤ f(x) ≤ h(x) near x = a, and lim g(x) = lim h(x) = L, then lim f(x) = L.

The classic example is proving that lim[x → 0] (sin x / x) = 1, which is foundational to the derivative of sin(x).

In machine learning, the Squeeze Theorem appears in convergence proofs — showing that an algorithm’s error is bounded between two functions that both converge to zero.

👉 Deep dive: The Squeeze Theorem Explained for Beginners

7. What Is Continuity? {#continuity}

Continuity in mathematical functions means, intuitively, that you can draw the function without lifting your pen. Formally, f(x) is continuous at x = a if and only if:

  1. f(a) is defined.
  2. lim[x → a] f(x) exists.
  3. lim[x → a] f(x) = f(a).

All three conditions must hold. This three-part definition is crucial — it is the bridge between limits and the rest of calculus.

Why continuity matters for ML:

  • Continuous loss functions allow gradient-based optimization to work smoothly.
  • The backpropagation algorithm requires differentiability, which in turn requires continuity.
  • Continuous activation functions (like sigmoid and tanh) historically dominated before ReLU showed that piecewise-continuous functions can also work well in practice.

A function continuous on its entire domain is called everywhere continuous. Polynomials, exponential functions, sine, and cosine are all everywhere continuous — which is why they appear so frequently in ML mathematics.

👉 Deep dive: What is Continuity in Mathematical Functions?

8. Types of Discontinuities.

When one or more of the three continuity conditions fail, we have a discontinuity. The three main types are:

Removable Discontinuity: The limit exists, but f(a) is either undefined or equals a different value. These can be “fixed” by redefining the function at one point. Example: f(x) = sin(x)/x has a removable discontinuity at x = 0.

Jump Discontinuity: The left-hand and right-hand limits both exist but are not equal. The function “jumps” at that point. The Heaviside step function used in early neural networks has a jump discontinuity at x = 0.

Infinite (Essential) Discontinuity: The function approaches ±∞ at some point. Example: f(x) = 1/x at x = 0.

Understanding these categories helps ML practitioners identify why certain functions cause training instability and why smooth approximations (like softmax instead of argmax) are preferred.

👉 Deep dive: Types of Discontinuities: Jump, Removable, and Infinite Explained

9. The Intermediate Value Theorem.

The Intermediate Value Theorem (IVT) states:

If f is continuous on [a, b] and N is any value between f(a) and f(b), then there exists at least one c in (a, b) such that f(c) = N.

In plain terms: a continuous function on a closed interval hits every value between its endpoints. It cannot skip over a value.

ML application: The IVT guarantees the existence of solutions to equations involving continuous functions — including loss functions. If your loss is 10 at epoch 0 and 0.1 at epoch 100, and the loss function is continuous, there must have been a moment when the loss was exactly 5, or 1, or any value in between. The IVT also underpins root-finding algorithms used in some optimization methods.

👉 Deep dive: The Intermediate Value Theorem Simplified

10. Limits in ML: Activation Functions and Asymptotes.

Now that you have the theoretical tools, let us connect limits in machine learning to real-world model design.

Every major activation function is defined by its limiting behavior:

Activationlim[x → -∞]lim[x → +∞]Implication
Sigmoid01Bounded output, vanishing gradients
Tanh-11Centered output, still vanishing gradients
ReLU0+∞Sparse activation, no upper bound
Softplus0+∞Smooth ReLU approximation
GELU0+∞Used in transformers (GPT, BERT)

The vanishing gradient problem — a major challenge in training deep networks — is directly caused by the asymptotic behavior of sigmoid and tanh. When |x| is large, these functions are nearly flat, their derivatives approach 0, and gradients vanish during backpropagation. Limit analysis reveals this problem before you ever run a single training epoch.

👉 Deep dive: How Limits and Asymptotes Apply to ML Activation Functions

11. Calculating Limits in Python with SymPy.

Theory is valuable. Code is actionable. The SymPy library brings symbolic mathematics to Python, allowing you to compute limits analytically — not just numerically.

python

from sympy import *

x = symbols('x')

# Basic limit
f = (x**2 - 1) / (x - 1)
print(limit(f, x, 1))          # Output: 2

# Limit at infinity (sigmoid)
sigmoid = 1 / (1 + exp(-x))
print(limit(sigmoid, x, oo))   # Output: 1
print(limit(sigmoid, x, -oo))  # Output: 0

# Classic trig limit
print(limit(sin(x)/x, x, 0))   # Output: 1

SymPy handles indeterminate forms, L’Hôpital’s Rule applications, and one-sided limits automatically. It is an essential tool for any ML practitioner who wants to verify theoretical results computationally.

👉 Deep dive: Calculating Limits in Python using the SymPy Library

12. Summary and Next Steps.

Limits and continuity are not abstract curiosities — they are the precise mathematical tools that make machine learning possible. Here is what you have learned in this guide:

  • A limit describes what value a function approaches, even if it never reaches it.
  • One-sided limits reveal behavior from the left and right independently.
  • Limit properties let you evaluate complex expressions algebraically.
  • Limits at infinity explain horizontal asymptotes in activation functions.
  • The Squeeze Theorem proves limits by bounding a function between two simpler ones.
  • Continuity requires the function to be defined, have a limit, and have the two agree.
  • The three types of discontinuities (removable, jump, infinite) each have distinct ML implications.
  • The IVT guarantees that continuous functions hit every intermediate value.
  • Activation functions like sigmoid, tanh, and ReLU are best understood through their limiting behavior.
  • SymPy lets you compute all of this in Python with just a few lines of code.

Your next steps:

Once you are comfortable with limits and continuity, the natural progression is:

  1. Derivatives and Differentiation Rules — built directly on limit definitions.
  2. Partial Derivatives and Gradients — the multi-variable extension critical for ML.
  3. The Chain Rule — the mathematical heart of backpropagation.
  4. Optimization Theory — how calculus finds minima of loss functions.

Each of those topics rests on the foundation you have just built. Take the time to work through the spoke articles below, practice the problems, and use SymPy to verify your answers.

External Resources

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top