Most calculus textbooks introduce the derivative by stating a rule, demonstrating that it works, and then moving on. The rule works, but this approach leaves students with a formula machine rather than a conceptual tool. When the rule fails to apply — when the function is defined piecewise, or when a limit does not exist — students who learned only the rule have nothing to fall back on. This guide builds the derivative from first principles.
The question the derivative answers
A function f(x) assigns a number to each input. The derivative at a point x = a answers the question: if I increase the input by a tiny amount, how much does the output change, per unit of input increase?
This is a rate of change. Nothing more. The slope of a line is a constant rate of change — rise over run. The derivative extends this idea to curves, where the rate of change is different at every point.
The secant line
Take two points on the graph of f: the point (a, f(a)) and a nearby point (a+h, f(a+h)), where h is some horizontal distance. The line connecting these two points is called a secant line. Its slope is:
slope of secant = (f(a+h) − f(a)) / h
This is the average rate of change of f over the interval from a to a+h. If you were driving, this would be your average speed over a stretch of road — not your speed at any particular moment.
The limit
Now imagine shrinking h toward zero. The second point slides along the curve toward the first. The secant line tilts and approaches the tangent line — the line that just touches the curve at a, without crossing it. The slope of the tangent line at x = a is:
f'(a) = lim[h→0] (f(a+h) − f(a)) / h
This limit — if it exists — is the derivative of f at the point a.
Three things can prevent the limit from existing: the function can have a corner (sharp turn, like |x| at x = 0), a cusp (vertical tangent), or a jump discontinuity. In all three cases, the derivative does not exist at that point.
Computing with the definition
Let f(x) = x². What is f'(x)?
f(x+h) = (x+h)² = x² + 2xh + h²
f(x+h) − f(x) = x² + 2xh + h² − x² = 2xh + h²
(f(x+h) − f(x)) / h = (2xh + h²) / h = 2x + h
lim[h→0] (2x + h) = 2x
So f'(x) = 2x. The derivative of x² is 2x — which is exactly what the power rule gives (bring the exponent down, reduce it by 1). But now you know why the power rule works for x²: it is the algebraic consequence of expanding a binomial squared and taking a limit.
Why most students never see this
The formal definition requires a comfort with limits that most courses build up slowly. Textbooks often introduce derivative rules — power rule, product rule, chain rule — before covering the rigorous definition. This is pragmatically defensible: students need to differentiate functions before they can reason formally about limits. But it leaves the definition feeling like an artefact of the introduction rather than the source of everything that follows.
The derivative as a function
When we computed f'(x) = 2x above, we found the derivative at every point simultaneously. f'(x) is itself a function — for each input x, it returns the slope of the tangent to f at that x. The derivative of f'(x) is f''(x), the second derivative, which tells you how the slope itself is changing.
For f(x) = x²: f'(x) = 2x, f''(x) = 2. The second derivative is constant — the slope increases at a uniform rate — which is what a parabola's curvature looks like.
Differentiability versus continuity
A function can be continuous but not differentiable. The absolute value function |x| is continuous at x = 0 (no jump), but not differentiable there (sharp corner — the left-hand limit of the difference quotient is −1, the right-hand limit is +1, and they disagree).
A function that is differentiable at a point is always continuous there. The reverse is not true. This asymmetry trips up many first-year calculus students who assume that smoothness and continuity are the same.
Frequently Asked Questions
Do I need to use the formal definition every time I differentiate?
No. The power rule, product rule, chain rule, and other standard rules are derived from the formal definition, and using them is perfectly rigorous. The definition is what you use when the standard rules do not apply, or when you are asked to prove why a rule works.
What is the difference between the derivative and the differential?
The derivative f'(x) is a function — a number (or function) representing the instantaneous rate of change. The differential dy = f'(x)dx is a linearisation tool: it approximates the change in output dy for a small change in input dx. They are related but different objects.