9. Differentiation

9.1. Motivations and First Principle Arguments

The equation \(y=mx+c\) represents a straight line. In this expression, the constant \(m\) tells us how quickly the slope \(y\) changes with \(x\).

For instance,

  • if \(m\)=1 then 𝑦 increases by 1 unit for every unit of increase in \(x\)

  • if \(m\)=2 then 𝑦 increases by 2 units for every unit of increase in \(x\)

  • if \(m\)=−2 then 𝑦 decreases by 2 units for every unit of increase in \(x\)

Worked Example

Find the slope of the straight line connecting the points (-3,2) and (5,7).

../_images/solution_1.png

Fig. 9.1 Calculating the slope of a straight line

The straight line that can be drawn through the points (-3,2) and (5,7).

  • the change in \(x\) is given by \(\Delta x\)=(5+3)=8

  • the change in \(y\) is given by \(Δy\)=(7−2)=5

So \(y\) increases by 5 units for every 8 units of increase in \(x\)

The rate of change of \(y\) with \(x\) (the slope) is \(\displaystyle\frac{Δy}{\Delta x}\) = \(\displaystyle\frac{5}{8}\) (This is the constant \(m\) in the equation of the line \(y=mx+c\))

Triangles drawn under the line have the same slope. By considering a triangle connecting the point (5,7) to (\(x\),\(y\)), we could write \(\displaystyle\frac{y−7}{x−5}\) = \(\displaystyle\frac{5}{8}\), which rearranges to \(y\)=\(\displaystyle\frac{5}{8}x\) + \(\displaystyle\frac{31}{8}\).

In a curve, the slope is not constant, but we can identify the slope at any point by drawing the tangent to the curve at that point. The tangent is the line that “just touches” the curve, and the normal is the line that is at right angles to the tangent.

../_images/slope.png

Fig. 9.2 The tangent at the point is indicated in blue and the normal is indicated in red. If the tangent has slope \(m\) then the normal has slope −1/\(m\).

In Fig. 9.2 we can see that if we moved the point along the curve, both the slope of the tangent line and the normal line change. We are interested in finding a mathematical expression for the slope of a curve at any given point \(x\).

../_images/height-base.png

Fig. 9.3 The curve shows a hypothetical function \(f\), and the black dashed line shows the tangent at the arbitrary point \((x,f(x))\). The slope of this line is the height:base ratio in the gray shaded triangle

As a first approximation, we construct a secant on the curve by joining \(x\) to a nearby point \(x+\Delta x\). Here, \(\Delta x\) means a small change in the parameter \(x\). This result is shown graphically in Fig. 9.3, we consider the secant line joining \((x,f(x))\) to a nearby point \((x+\Delta x,f(x+\Delta x))\)

The slope of the secant line is given by:

(9.1)\[\displaystyle\frac{Δf}{\Delta x}=\displaystyle\frac{f(x+\Delta x)−f(x)}{\Delta x}\]

As we make \(\Delta x\) smaller so that the two points are closer together, the secant line approaches the tangent. We are therefore interested in what happens to (9.1) “as \(\Delta x\) approaches zero”

Consider the function \(f(x)=x^2\). Using the definition given in (9.1), we have

\(\displaystyle\frac{Δf}{\Delta x}=\displaystyle\frac{f(x+\Delta x)−f(x)}{\Delta x}=\displaystyle\frac{(x+\Delta x)^2−x^2}{Δ𝑥}=\displaystyle\frac{x^2+2\Delta x+(\Delta x)^2−x^2}{Δ𝑥}=2x+\Delta x\)

As \(\Delta x\) approaches zero the result approaches \(2x\), which we can write as \(\displaystyle\frac{Δ𝑓}{Δ𝑥}→2x\) because \(\Delta x→0\).

More formally, the result is written as

\(\displaystyle \lim_{\Delta x \to 0} \displaystyle\frac{f(x+\Delta x)-f(x)}{\Delta x}=2x\)

and we say that “the limit \(\Delta x\)→0”, the result is \(2x\).

../_images/x2.png

Fig. 9.4 A plot of the function \(y=x^2\), together with the tangent. The tangent has slope \(2x\).

Note that in this example, a factor of \(\Delta x\) was cancelled from the numerator and denominator. The limit is not evaluated at \(\Delta x=0\), but as \(\Delta x\) approaches \(0\). The derivative of a function \(f\) with respect to \(x\) is given by the result

(9.2)\[\displaystyle\frac{\mathrm{d}𝑓}{\mathrm{d}x}=\displaystyle \lim_{\Delta x \to 0}\displaystyle\frac{f(x+\Delta x)−f(x)}{\Delta x}\]

The derivative \(\displaystyle\frac{\mathrm{d}f}{\mathrm{d}x}\) is also written \(f'(x)\). The two different notations are known as Newton and Leibniz notation.

The process of calculating the derivative is called “differentiation”.

As \(\Delta x\rightarrow 0\), both the numerator and denominator of the fraction tend to zero, yet in most cases we will see that their ratio approaches a finite limit. What determines the limit is how quickly the numerator approaches zero, relative to the denominator.

9.2. Derivative as a “rate of change”

Differentiation can be thought of as a measure of the rate of change of one variable with respect to another. For instance, \(\displaystyle\frac{\mathrm{d}y}{\mathrm{d}x}\) is a measure of how quickly \(y\) changes (instantaneously) as \(x\) changes. Here, we call \(y\) the dependent variable, and we call \(x\) the independent variable.

In many problems the independent variable is time. For example, consider the case of a simple pendulum shown in Fig. 9.5, where \(\theta(t)\) measures the anticlockwise angle of the pendulum from the downward vertical as a function of time \(t\). The pendulum is initially released from rest at a positive angle.

../_images/pendulum.png

Fig. 9.5 A pendulum swing diagram, the angle of inclination with the downward vertical is denoted by \(\theta\) and is measured in the anti-clockwise direction. The graph on the right shows the rate of change of \(\theta\) with respect to time \(t\).

  • At the maximum height of the swing (amplitude), the pendulum comes to an instantaneous standstill, and so \(\displaystyle\frac{\mathrm{d}\theta}{\mathrm{d}t}=0\).

  • As the pendulum swings clockwise, \(\theta\) is decreasing and so \(\displaystyle\frac{\mathrm{d}\theta}{\mathrm{d}t}<0\).

  • As the pendulum swings anticlockwise, \(\theta\) is increasing and so \(\displaystyle\frac{\mathrm{d}\theta}{\mathrm{d}t}>0\).

  • On the downswing the pendulum picks up speed. The angular speed (rate of change of \(|\theta|\)) is greatest at the mid-point of each swing.

9.3. Second and Higher Derivatives

We can differentiate a function repeatedly. For example, we might differentiate the function \(3x^2+5x^3\) w.r.t. \(x\) twice:

\[\frac{\mathrm{d}}{\mathrm{d}x}\Big(\displaystyle\frac{\mathrm{d}}{\mathrm{d}x}(3x^2+5x^3)\Big)=\displaystyle\frac{\mathrm{d}}{\mathrm{d}x}(6x+15x^2)=6+30x\]

We call this result the “second derivative” w.r.t. \(x\). and we write \(\displaystyle\frac{\mathrm{d}^2}{\mathrm{d}x^2}(3x^2+5x^3)=6+30x\). In general, the \(n^{th}\) derivative is denoted by \(\displaystyle\frac{\mathrm{d}^n}{\mathrm{d}x^n}\). We have already seen that the notation \(f'(x)\) can be used to denote the first derivative \(\displaystyle\frac{\mathrm{d}f}{\mathrm{d}x}\), and this notation can be extended to higher derivatives:

\[f''(x)=\displaystyle\frac{\mathrm{d}^2}{\mathrm{d}x^2}, \quad f'''(x)=\displaystyle\frac{\mathrm{d}^3}{\mathrm{d}x^3},\quad \text{etc}\]

The dash notation becomes a bit cumbersome for higher derivatives, so we write \(f^{(n)}(x)=\displaystyle\frac{\mathrm{d}^nf}{\mathrm{d}x^n}\).

For example, \(f^{(4)}(x)=f''''(x)=\displaystyle\frac{\mathrm{d}^4f}{\mathrm{d}x^4}\)

There are still more ways to write the derivative of a function, and we will introduce some of them in the chapter of the notes about partial derivatives.

Dotty notation is used for differentiation with respect to time:

\[\dot{x}=\displaystyle\frac{\mathrm{d}x}{\mathrm{d}t}, \quad \ddot{x}=\displaystyle\frac{\mathrm{d}^2x}{\mathrm{d}t^2}\]

9.4. Stationary Points

Definition

The point (\(x_0,f(x_0))\) is a stationary point of \(f(x)\) if \(f′(x_0)=0\).

To classify the stationary points, we can look at the slope of the curve at a smaller distance \(\epsilon\) either side of them, as illustrated in the table below:

alternative description

Which we can see graphically:

../_images/StationaryPoints2.png

9.4.1. First Derivative Test

Worked Example

Find and classify the stationary points of \(f(x)=-x^3+9x^2−24x+20\):

\[f'(x)=−3x^2+18x−24=−3(x−2)(x−4)\]

The stationary points are at \(x=2,4\)

Check the sign of the gradient:

x=1

x=2

x=3

x=4

x=5

-

0

+

0

-

We know that the gradient changes sign only at the points \(x\)=2 and \(x\)=4, so testing the point \(x\)=3 tells us the sign of the gradient immediately right of \(x\)=2 and immediately left of \(x\)=4.

From the table above, we can infer that \(x\)=2 is a local minimum and \(x\)=4 is a local maximum.

9.4.2. Second derivative test

The second derivative measures the rate of change of the slope, since

\[\frac{\mathrm{d}^2f}{\mathrm{d}x^2}=\displaystyle\frac{\mathrm{d}}{\mathrm{d}x}\displaystyle\frac{\mathrm{d}f}{\mathrm{d}x}=\displaystyle\frac{\mathrm{d}s}{\mathrm{d}x}\]

where \(s\) measures the slope.

Thus, the second derivative is a measure of concavity.

  • When \(\displaystyle\frac{\mathrm{d}^2f}{\mathrm{d}x^2}>0\) the slope is increasing : we say that the function is concave upwards. For example, the function \(x^2\) is concave upwards on its entire domain. It’s slope is always increasing: \(\displaystyle\frac{\mathrm{d}^2x^2}{\mathrm{d}x^2}=2>0\) \(\forall\, x\)

  • When \(\displaystyle\frac{\mathrm{d}^2f}{\mathrm{d}x^2}<0\) the slope is decreasing : we say that the function is concave downwards. For example, the function \(−x^2\) is concave downwards on its entire domain. It’s slope is always decreasing.

  • When \(\displaystyle\frac{\mathrm{d}^2f}{\mathrm{d}x^2}=0\) the slope of the function is not changing (it remains constant)

We can therefore use the second derivative to classify local maxima/minima:

If the function is concave upwards at a stationary point, it is a local minimum If the function is concave downward at a stationary point, it is a local maximum

A point of inflection is a point where the concavity of a function \(f\) changes sign. Therefore, at a point of inflection, \(f''(x)=0\). However, it is important to note that \(f''(c)=0\) does guarantee that a point is an inflection, as some concave up/down functions also satisfy this criterion (\(f(x)=\cosh(x)\) is an example). In this case, further testing using the first derivative test is needed.

9.5. Differentiation Rules

9.5.1. Sum rule

Definition

\[\frac{\mathrm{d}}{\mathrm{d}x}(u+v)=\frac{\mathrm{d}u}{\mathrm{d}x}+\frac{\mathrm{d}v}{\mathrm{d}x}\]

This result says that the derivative of a sum is equal to the sum of the derivatives.

For example, \(\displaystyle\frac{\mathrm{d}}{\mathrm{d}x}(x^5+x^3)=5x^4+3x^2\)

9.5.2. Product rule

Definition

\[\frac{\mathrm{d}}{\mathrm{d}x}(uv)=u\frac{\mathrm{d}v}{\mathrm{d}x}+v\frac{\mathrm{d}u}{\mathrm{d}x}\]

A special case is when one of the functions is a constant \(k\). Then, we have

\(\displaystyle\frac{\mathrm{d}}{\mathrm{d}x}(kf(x))=k\displaystyle\frac{\mathrm{d}f}{\mathrm{d}x}+0.\)

For example, \(\displaystyle\frac{\mathrm{d}}{\mathrm{d}x}(3x^5)=15x^4.\)

Worked Example

Calculate \(\displaystyle\frac{\mathrm{d}}{\mathrm{d}x}(x^3\sin(x))\):

\[\frac{\mathrm{d}}{\mathrm{d}x}(x^3\sin(x))=x^3\frac{\mathrm{d}}{\mathrm{d}x}(\sin(x))+\sin(x)\frac{\mathrm{d}}{\mathrm{d}x}x^3=x^3\cos(x)+3x^2\sin(x)\]

9.5.3. Quotient rule

Definition

\[\frac{\mathrm{d}}{\mathrm{d}x}(\frac{u}{v})=\frac{v\frac{\mathrm{d}u}{\mathrm{d}x}−u\displaystyle\frac{\mathrm{d}v}{\mathrm{d}x}}{v^2}\]

To prove this, let \(f(x)=\displaystyle\frac{u(x)}{v(x)}\) and rearrange to give \(u(x)=f(x)v(x)\).

Then differentiate both sides w.r.t. \(x\), applying the product rule to calculate the result on the right.

Rearrange your answer to obtain \(f'(x)\) entirely in terms of \(u\), \(v\) and their derivatives, \(u=fv\) gives \(u'=fv'+vf'\)

and rearranging gives \(f'=\displaystyle\frac{u'−fv'}{v}\).

We can substitute in \(f=u/v\) to obtain the final result:

\[f'=\frac{u'−\displaystyle\frac{u}{v}v′}{v}=\frac{u'v-uv'}{v^2}\]

Worked Example

Use the quotient rule to obtain the result for \(\displaystyle\frac{\mathrm{d}}{\mathrm{d}x}\tan(x)\)

Let \(u=\sin(x)\), \(v=\cos(x)\). Then, by the quotient rule,

\[\frac{\mathrm{d}}{\mathrm{d}}(\frac{𝑢}{𝑣})=\frac{v\frac{\mathrm{d}u}{\mathrm{d}x}−u\frac{\mathrm{d}v}{\mathrm{d}x}}{v^2} =\frac{\cos^2(x)+\sin^2(x)}{\cos^2(x)}=\frac{1}{\cos^2(x)}=\sec^2(x)\]

9.5.4. Chain rule

Definition

The chain rule is defined if two functions \(f=f(g)\) and \(g=g(x)\) are both differentiable then@

\[\frac{\mathrm{d}f}{\mathrm{d}x}=\frac{\mathrm{d}f}{\mathrm{d}g}\displaystyle\frac{\mathrm{d}g}{\mathrm{d}x}\]

An important special case can be deduced by noting that \(\displaystyle\frac{\mathrm{d}f}{\mathrm{d}f}=1\), which gives:

\[\frac{\mathrm{d}f}{\mathrm{d}x}\frac{\mathrm{d}x}{\mathrm{d}f}=1\]

This result can be motivated by noting that

\[\lim_{\Delta x \rightarrow 0}\frac{\Delta f}{\Delta x} = \lim_{\Delta x \rightarrow 0}\frac{\Delta f}{\Delta g}\frac{\Delta g}{\Delta x}\]

We have to take great care when treating derivatives like fractions involving finite quantities - the anticipated results do not always hold true, as we will see when we study partial differentiation.

Worked Example

Suppose that we wish to differentiate the following function w.r.t. \(x\) :

\[f=\sin^2(x)+\displaystyle\frac{1}{\sin(x)}\]

We know how to differentiate \(\sin(x)\) w.r.t. \(x\) and we know how to differentiate \(g^2+\displaystyle\frac{1}{g}\) w.r.t. \(g\).

This motivates us to introduce the change of variables \(g=\sin(x)\) so that we may write \(f=g^2+\displaystyle\frac{1}{g}\).

Then, we have the results:

\[\begin{split}\frac{\mathrm{d}f}{\mathrm{d}g} &= 2g−1g \\ \frac{\mathrm{d}g}{\mathrm{d}x} &= \cos(x)\end{split}\]

Intuitively, we hope to combine these two results to find the rate of change of \(f\) w.r.t. \(x\).

The chain rules gives

\[\frac{\mathrm{d}f}{\mathrm{d}g} = \Big(2g − \frac{1}{g^2}\Big) \cos(x)\]

where \(g = \sin(x)\). Writing the expression fully in terms of \(x\) provides the answer:

\[\frac{\mathrm{d}f}{\mathrm{d}g} = \Bigg(2\sin(x) − \frac{1}{\sin^2(x)}\Bigg) \cos(x)\]

9.6. Parametric Differentiation

We can express the equation of a circle in the form \(x=\cos(t)\), \(y=\sin(t)\). This is known as a parametric representation. By varying the parameter \(t\), the entire circle is mapped out. In principle, any curve can be parameterised in terms of a single parameter, regardless of the number of coordinates. To describe a surface, two parameters are required. For example, the surface of a sphere can be described by varying two parameters such as the latitude and longitude.

According to the chain rule, we can write:

(9.3)\[\frac{\mathrm{d}y}{\mathrm{d}x}=\frac{\mathrm{d}y}{\mathrm{d}t}\displaystyle\frac{\mathrm{d}t}{\mathrm{d}x}= \frac{\mathrm{d}y}{\mathrm{d}t}\Bigg/ \frac{\mathrm{d}x}{\mathrm{d}t}\]

So, we obtain a result for \(\displaystyle\frac{\mathrm{d}y}{\mathrm{d}x}\) in terms of the rate of change of each variable w.r.t. parameter \(t\). This result is known as parametric differentiation. The result is obtained fully in terms of the parameter.

Worked Example

For the unit circle parameterisation, calculate \(\displaystyle\frac{\mathrm{d}y}{\mathrm{d}x}\) using parametric differentiation.

Verify your answer by using implicit differentiation using the equation relating \(y\) and \(x\).

For \(x=\cos(t)\), \(y=\sin(t)\),

\(\displaystyle\frac{\mathrm{d}x}{\mathrm{d}t}=−\sin(t)\), \(\displaystyle\frac{\mathrm{d}y}{\mathrm{d}t}=\cos(t)\)

So, \(\displaystyle\frac{\mathrm{d}y}{\mathrm{d}x}=−\cot(t)\)

In this case it is straightforward to write the result in terms of \(x\) and \(y\)

\[\frac{\mathrm{d}y}{\mathrm{d}x}=−\frac{x}{y}\]

The equation relating \(x\) and \(y\) is \(x^2+y^2=1\)

Differentiating throughout w.r.t. \(x\) we obtain:

\[2x+2y\frac{\mathrm{d}y}{\mathrm{d}x}=0\]

and rearranging provides again the result \(\displaystyle\frac{\mathrm{d}y}{\mathrm{d}x}=-\displaystyle\frac{x}{y}\)

9.7. Derivatives of Inverse Functions

In this scenario we wish to calculate \(\displaystyle\frac{\mathrm{d}y}{\mathrm{d}x}\) where \(y = f^{-1}(x)\) and we know how to differentiate function \(f\).

Worked example

Calculate \(\displaystyle\frac{\mathrm{d}}{\mathrm{d}x}\ln(x)\)

We let \(y=\ln(x)\) such that \(x=e^y\), this means that \(\displaystyle\frac{\mathrm{d}x}{\mathrm{d}y}=e^y\) and by the chain rule:

\[\frac{\mathrm{d}y}{\mathrm{d}x}=1\Big/\frac{\mathrm{d}x}{\mathrm{d}y}=\frac{1}{e^y} = e^{-y}\]

This is not an acceptable result because the derivative has been given in terms of the dependent variable - we need to rewrite in terms of the independent variable \(x\).

For some problems of this type, it can be quite difficult, but here is is easy since \(e^y=x\).

Thus the final result is:

\[\frac{\mathrm{d}}{\mathrm{d}x}\Big(\ln(x)\Big)=\frac{1}{x}\]

another important (and familar) result.

9.8. Implicit Differentiation

Up to now, we have been calculating the derivatives of functions given explicitly in terms of the dependent variable in the manner \(y=y(x)\). However, there are many occasions where we want to calculate the derivative of a function \(y\) that is implicitly related to the dependent variable \(x\) in the manner \(f(x,y)=0\).

In that case, we differentiate the entire expression with respect to the independent variable and apply the chain rule to differentiate terms involving the dependent variable.

Worked Example

Lets calculate the result \(\displaystyle\frac{\mathrm{d}}{\mathrm{d}x}\Big(x^n\Big)\) for \(n \in \mathbb{R}\).

As usual, we let \(y=x^n\) and then we rearrange to a convenient form.

In this case we take the natural logarithm of both sides, \(\ln(y)=n\ln(x)\)

Then we differentiate the whole expression w.r.t. \(x\)

\[\frac{\mathrm{d}}{\mathrm{d}x}\ln(y)=\frac{n}{x}\]

We apply the chain rule to the left-hand-side:

Combining the two results and rearranging gives:

\[\frac{\mathrm{d}y}{\mathrm{d}x}=n\frac{y}{x}\]

and finally, rewriting all in terms of \(x\) gives:

\[\frac{\mathrm{d}y}{\mathrm{d}x}=n\frac{x^n}{x}=nx^{n-1}\]